• Your Name's avatar
    Implement proper loadswap/loadall/ondemand model management modes · c08a5b4f
    Your Name authored
    - Default mode changed to ondemand (pre-load first model, unload/load on switch)
    - loadswap: load first model in VRAM, others in CPU RAM, swap on switch
    - loadall: try to load all models in VRAM, offload to CPU RAM if OOM
    - --nopreload: skip pre-loading in any mode, load on first request
    - request_model() now properly handles all three modes
    - Added _move_model_to_cpu() and _move_model_to_vram() for loadswap
    - Fixed NameError: model_manager reference in request_model() (was using global singleton instead of self)
    - Updated CLI help text for --loadall, --loadswap, --nopreload
    c08a5b4f
Name
Last commit
Last update
.vscode Loading commit data...
codai Loading commit data...
.gitignore Loading commit data...
LICENSE.md Loading commit data...
README.md Loading commit data...
build.sh Loading commit data...
coder Loading commit data...
coderai Loading commit data...
requirements-nvidia.txt Loading commit data...
requirements-vulkan.txt Loading commit data...
requirements.txt Loading commit data...