• Your Name's avatar
    Fix: Centralize model unloading - properly handle all model types in ondemand mode · 00775972
    Your Name authored
    - Added unload_all_models() to MultiModelManager that handles ALL model types:
      ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects
    - Text endpoints now properly unload image models before loading text models
    - Image endpoints now properly unload text models before loading image models
    - The rule: in ondemand mode, if the model in VRAM differs from the requested
      model (regardless of type), fully unload before loading the new one
    - Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay
    00775972
Name
Last commit
Last update
..
cache Loading commit data...
__init__.py Loading commit data...
capabilities.py Loading commit data...
grammar.py Loading commit data...
manager.py Loading commit data...
parser.py Loading commit data...
templates.py Loading commit data...
tool_call_grammar.gbnf Loading commit data...
utils.py Loading commit data...