Fix balanced offload strategy VRAM estimation
- Account for LoRA overhead (~4GB) in VRAM calculations - Add 30% inference overhead for activation memory - Use more conservative 70% threshold (was 85%) - Add OOM fallback to model CPU offload if GPU loading fails - Switch fallback from sequential to model offload for better performance
Showing
Please
register
or
sign in
to comment