Files · 7d83896270fd8ed89c44bfd6a789dd648399fc40 · nexlab / coderai

Fix: In ondemand mode, fully unload current model before loading new one · 7d838962

Your Name authored Mar 19, 2026

- In ondemand mode (no --load-all or --loadswap specified), when a new model
  is requested, the current model in VRAM is now fully unloaded before loading
  the new one. This ensures clean model switching.
- Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints
- Added same logic to image generation endpoints (diffusers and sd.cpp paths)
- Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()

7d838962

Name	Last commit	Last update
.vscode		Loading commit data...
codai		Loading commit data...
.gitignore		Loading commit data...
LICENSE.md		Loading commit data...
README.md		Loading commit data...
build.sh		Loading commit data...
coder		Loading commit data...
coderai		Loading commit data...
requirements-nvidia.txt		Loading commit data...
requirements-vulkan.txt		Loading commit data...
requirements.txt		Loading commit data...

README.md