-
Stefy Lanza (nextime / spora ) authored
- Account for LoRA overhead (~4GB) in VRAM calculations - Add 30% inference overhead for activation memory - Use more conservative 70% threshold (was 85%) - Add OOM fallback to model CPU offload if GPU loading fails - Switch fallback from sequential to model offload for better performance
15e14a57