Centralize model resolution and VRAM management in MultiModelManager.request_model()
- Added request_model() method to MultiModelManager that handles:
1. Alias resolution (image, audio, tts, vision, default, custom aliases)
2. VRAM management (unloading previous models in ondemand mode)
3. Checking if model is already loaded
- Simplified codai/api/images.py:
- Uses request_model() for model resolution and VRAM management
- Extracted helper functions: _is_gguf_model(), _load_diffusers_pipeline(),
_generate_with_diffusers(), _generate_with_sdcpp(), _load_sdcpp_model()
- Removed duplicated sd.cpp generation code
- Fixed semaphore scope (all generation now inside semaphore block)
- Simplified codai/api/tts.py:
- Uses request_model() instead of duplicated VRAM management code
- Removed duplicate get_cached_model_path() and get_model_cache_dir() wrappers
- Simplified codai/api/transcriptions.py:
- Uses request_model() instead of duplicated VRAM management code
- Simplified codai/api/text.py:
- Both /v1/chat/completions and /v1/completions use request_model()
- Removed duplicated VRAM management blocks
Showing
This diff is collapsed.
Please
register
or
sign in
to comment