codai/api/images.py · 7d83896270fd8ed89c44bfd6a789dd648399fc40 · nexlab / coderai

Fix: In ondemand mode, fully unload current model before loading new one · 7d838962

Your Name authored Mar 19, 2026

- In ondemand mode (no --load-all or --loadswap specified), when a new model
  is requested, the current model in VRAM is now fully unloaded before loading
  the new one. This ensures clean model switching.
- Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints
- Added same logic to image generation endpoints (diffusers and sd.cpp paths)
- Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()

7d838962

images.py 44.2 KB

Replace images.py