Wan2.2 video fixes, pipeline cache, smarter offload, model-load tasks
Wan2.2 A14B (dual-expert) generation fixes:
- Fuse the Lightning distill LoRA into BOTH experts (transformer +
transformer_2); diffusers' fuse_lora defaults to ["transformer"] only, which
left the low-noise expert undistilled → 4-step clips collapsed to a solid
colour. Also load per-request fighter/env LoRAs into both experts.
- Pre-configure the wan22_lightning_4step preset with the local high/low-noise
LoRAs (lora_high/lora_low), used when acceleration is enabled, ignored when
not; surfaced in the Acceleration UI.
- Safety net: only apply the preset's low step count when the distill LoRA
actually fused, else fall back to safe steps.
- Skip bitsandbytes/quanto quant for the VAE (conv-only → "no linear modules").
VRAM / offload:
- Strategy auto-selection actually fires now ('auto' is normalised, not passed
through as a no-op) and no longer double-counts the runtime/accel reserve.
- Graceful OOM degrade ladder: full-GPU → balanced @ configured% → 80 → 60 →
40 → sequential → disk, respecting the model's balanced_gpu_percent as the
starting cap. Expose 'balanced' as a selectable offload strategy.
Pipeline disk cache (--pipeline-cache / --rebuild-pipeline-cache):
- Cache the quantized base pipeline to disk and reload it on later starts,
skipping re-download/re-quantization; accel LoRA re-fused per load. Fail-safe
with self-healing invalidate-and-rebuild.
Tasks / misc:
- Show model loading as a (non-cancellable, non-pausable) Tasks entry.
- Filter the Tasks-page pollers from the access log unless --debug-web.
- Township gen script: per-image keyframe progress (no longer all-or-nothing).
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
This diff is collapsed.
Please
register
or
sign in
to comment