• Stefy Lanza (nextime / spora )'s avatar
    video: param-weighted VRAM estimate, smarter auto offload, runtime accel LoRA · eeb3bba1
    Stefy Lanza (nextime / spora ) authored
    VRAM estimation (manager.py):
    - Weight the effective quant multiplier by REAL per-component parameter
      shares (new _component_param_shares scans safetensors by component folder)
      instead of a blind 70/30 split. Wan2.2 is 99.6% quantizable (two 14B
      experts + text encoder 4-bit, only the 0.13B VAE dense), so the old 0.475x
      multiplier inflated ~25.8 GB -> 42.7 GB and forced needless offload. Now
      ~0.28x -> ~25.8 GB. VAE forced dense (conv-only, bnb can't quantize).
    
    Auto offload decision (video.py):
    - 'auto': when peak footprint exceeds free VRAM, go straight to `model` CPU
      offload (active component on GPU, near full-GPU speed) — no full-GPU gamble,
      no slow balanced+disk path.
    - 'auto-borderline' (new mode): same, except a marginal overshoot (<=3 GB)
      tries full-GPU first to keep both experts resident and use free VRAM,
      falling back to model offload on OOM.
    
    Acceleration LoRA (acceleration.py + video.py):
    - Keep the distill/Lightning LoRA as an ACTIVE RUNTIME ADAPTER instead of
      fusing. Fusing into CPU-offloaded bitsandbytes 4-bit weights triggers a
      dequant->merge->requant per Linear on the CPU — minutes/hours per expert,
      appearing to hang (high CPU, empty VRAM). Runtime adapters apply at forward
      time on the GPU at negligible cost and natively cover transformer_2.
    - _sync_video_loras preserves the accel adapters across per-request LoRA swaps
      and re-includes them in every set_adapters; _unload_video_loras deletes only
      per-request adapters, keeping accel.
    
    UI (models.html):
    - Add "Auto borderline-aware" offload strategy option + updated hint.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    eeb3bba1
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...