• Stefy Lanza (nextime / spora )'s avatar
    Wan2.2 video fixes, pipeline cache, smarter offload, model-load tasks · dbc09b75
    Stefy Lanza (nextime / spora ) authored
    Wan2.2 A14B (dual-expert) generation fixes:
    - Fuse the Lightning distill LoRA into BOTH experts (transformer +
      transformer_2); diffusers' fuse_lora defaults to ["transformer"] only, which
      left the low-noise expert undistilled → 4-step clips collapsed to a solid
      colour. Also load per-request fighter/env LoRAs into both experts.
    - Pre-configure the wan22_lightning_4step preset with the local high/low-noise
      LoRAs (lora_high/lora_low), used when acceleration is enabled, ignored when
      not; surfaced in the Acceleration UI.
    - Safety net: only apply the preset's low step count when the distill LoRA
      actually fused, else fall back to safe steps.
    - Skip bitsandbytes/quanto quant for the VAE (conv-only → "no linear modules").
    
    VRAM / offload:
    - Strategy auto-selection actually fires now ('auto' is normalised, not passed
      through as a no-op) and no longer double-counts the runtime/accel reserve.
    - Graceful OOM degrade ladder: full-GPU → balanced @ configured% → 80 → 60 →
      40 → sequential → disk, respecting the model's balanced_gpu_percent as the
      starting cap. Expose 'balanced' as a selectable offload strategy.
    
    Pipeline disk cache (--pipeline-cache / --rebuild-pipeline-cache):
    - Cache the quantized base pipeline to disk and reload it on later starts,
      skipping re-download/re-quantization; accel LoRA re-fused per load. Fail-safe
      with self-healing invalidate-and-rebuild.
    
    Tasks / misc:
    - Show model loading as a (non-cancellable, non-pausable) Tasks entry.
    - Filter the Tasks-page pollers from the access log unless --debug-web.
    - Township gen script: per-image keyframe progress (no longer all-or-nothing).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    dbc09b75
pipeline_cache.py 6.21 KB