Fix: Centralize model unloading - properly handle all model types in ondemand mode
- Added unload_all_models() to MultiModelManager that handles ALL model types: ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects - Text endpoints now properly unload image models before loading text models - Image endpoints now properly unload text models before loading image models - The rule: in ondemand mode, if the model in VRAM differs from the requested model (regardless of type), fully unload before loading the new one - Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay
Showing
This diff is collapsed.
Please
register
or
sign in
to comment