-
Your Name authored
- Default mode changed to ondemand (pre-load first model, unload/load on switch) - loadswap: load first model in VRAM, others in CPU RAM, swap on switch - loadall: try to load all models in VRAM, offload to CPU RAM if OOM - --nopreload: skip pre-loading in any mode, load on first request - request_model() now properly handles all three modes - Added _move_model_to_cpu() and _move_model_to_vram() for loadswap - Fixed NameError: model_manager reference in request_model() (was using global singleton instead of self) - Updated CLI help text for --loadall, --loadswap, --nopreload
c08a5b4f