codai/models/pipeline_cache.py · c84c620814110a9bf4944e1d338bd15d8eda4e93 · nexlab / coderai

Wan2.2 A14B (dual-expert) generation fixes: - Fuse the Lightning distill LoRA into BOTH experts (transformer + transformer_2); diffusers' fuse_lora defaults to ["transformer"] only, which left the low-noise expert undistilled → 4-step clips collapsed to a solid colour. Also load per-request fighter/env LoRAs into both experts. - Pre-configure the wan22_lightning_4step preset with the local high/low-noise LoRAs (lora_high/lora_low), used when acceleration is enabled, ignored when not; surfaced in the Acceleration UI. - Safety net: only apply the preset's low step count when the distill LoRA actually fused, else fall back to safe steps. - Skip bitsandbytes/quanto quant for the VAE (conv-only → "no linear modules"). VRAM / offload: - Strategy auto-selection actually fires now ('auto' is normalised, not passed through as a no-op) and no longer double-counts the runtime/accel reserve. - Graceful OOM degrade ladder: full-GPU → balanced @ configured% → 80 → 60 → 40 → sequential → disk, respecting the model's balanced_gpu_percent as the starting cap. Expose 'balanced' as a selectable offload strategy. Pipeline disk cache (--pipeline-cache / --rebuild-pipeline-cache): - Cache the quantized base pipeline to disk and reload it on later starts, skipping re-download/re-quantization; accel LoRA re-fused per load. Fail-safe with self-healing invalidate-and-rebuild. Tasks / misc: - Show model loading as a (non-cancellable, non-pausable) Tasks entry. - Filter the Tasks-page pollers from the access log unless --debug-web. - Township gen script: per-image keyframe progress (no longer all-or-nothing). Co-Authored-By:

Claude Opus 4.8 <noreply@anthropic.com>

pipeline_cache.py 6.21 KB

Replace pipeline_cache.py