codai · 7d83896270fd8ed89c44bfd6a789dd648399fc40 · nexlab / coderai

Fix: In ondemand mode, fully unload current model before loading new one · 7d838962

Your Name authored Mar 19, 2026

- In ondemand mode (no --load-all or --loadswap specified), when a new model
  is requested, the current model in VRAM is now fully unloaded before loading
  the new one. This ensures clean model switching.
- Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints
- Added same logic to image generation endpoints (diffusers and sd.cpp paths)
- Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()

7d838962

Name	Last commit	Last update
..
api		Loading commit data...
backends		Loading commit data...
models		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
main.py		Loading commit data...