codai · eeb3bba1ad9c5eb91bbb3408d6d4f76c42ab623f · nexlab / coderai

video: param-weighted VRAM estimate, smarter auto offload, runtime accel LoRA · eeb3bba1

Stefy Lanza (nextime / spora ) authored Jun 12, 2026

VRAM estimation (manager.py):
- Weight the effective quant multiplier by REAL per-component parameter
  shares (new _component_param_shares scans safetensors by component folder)
  instead of a blind 70/30 split. Wan2.2 is 99.6% quantizable (two 14B
  experts + text encoder 4-bit, only the 0.13B VAE dense), so the old 0.475x
  multiplier inflated ~25.8 GB -> 42.7 GB and forced needless offload. Now
  ~0.28x -> ~25.8 GB. VAE forced dense (conv-only, bnb can't quantize).

Auto offload decision (video.py):
- 'auto': when peak footprint exceeds free VRAM, go straight to `model` CPU
  offload (active component on GPU, near full-GPU speed) — no full-GPU gamble,
  no slow balanced+disk path.
- 'auto-borderline' (new mode): same, except a marginal overshoot (<=3 GB)
  tries full-GPU first to keep both experts resident and use free VRAM,
  falling back to model offload on OOM.

Acceleration LoRA (acceleration.py + video.py):
- Keep the distill/Lightning LoRA as an ACTIVE RUNTIME ADAPTER instead of
  fusing. Fusing into CPU-offloaded bitsandbytes 4-bit weights triggers a
  dequant->merge->requant per Linear on the CPU — minutes/hours per expert,
  appearing to hang (high CPU, empty VRAM). Runtime adapters apply at forward
  time on the GPU at negligible cost and natively cover transformer_2.
- _sync_video_loras preserves the accel adapters across per-request LoRA swaps
  and re-includes them in every set_adapters; _unload_video_loras deletes only
  per-request adapters, keeping accel.

UI (models.html):
- Add "Auto borderline-aware" offload strategy option + updated hint.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

eeb3bba1

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...