codai/models · 00775972af0105acd35d03d665a211685d9d84c0 · nexlab / coderai

Fix: Centralize model unloading - properly handle all model types in ondemand mode · 00775972

Your Name authored Mar 19, 2026

- Added unload_all_models() to MultiModelManager that handles ALL model types:
  ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects
- Text endpoints now properly unload image models before loading text models
- Image endpoints now properly unload text models before loading image models
- The rule: in ondemand mode, if the model in VRAM differs from the requested
  model (regardless of type), fully unload before loading the new one
- Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay

00775972

Name	Last commit	Last update
..
cache		Loading commit data...
__init__.py		Loading commit data...
capabilities.py		Loading commit data...
grammar.py		Loading commit data...
manager.py		Loading commit data...
parser.py		Loading commit data...
templates.py		Loading commit data...
tool_call_grammar.gbnf		Loading commit data...
utils.py		Loading commit data...