• Your Name's avatar
    Implement proper loadswap/loadall/ondemand model management modes · c08a5b4f
    Your Name authored
    - Default mode changed to ondemand (pre-load first model, unload/load on switch)
    - loadswap: load first model in VRAM, others in CPU RAM, swap on switch
    - loadall: try to load all models in VRAM, offload to CPU RAM if OOM
    - --nopreload: skip pre-loading in any mode, load on first request
    - request_model() now properly handles all three modes
    - Added _move_model_to_cpu() and _move_model_to_vram() for loadswap
    - Fixed NameError: model_manager reference in request_model() (was using global singleton instead of self)
    - Updated CLI help text for --loadall, --loadswap, --nopreload
    c08a5b4f
Name
Last commit
Last update
..
api Loading commit data...
backends Loading commit data...
models Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
main.py Loading commit data...