• Your Name's avatar
    Centralize model resolution and VRAM management in MultiModelManager.request_model() · e004541a
    Your Name authored
    - Added request_model() method to MultiModelManager that handles:
      1. Alias resolution (image, audio, tts, vision, default, custom aliases)
      2. VRAM management (unloading previous models in ondemand mode)
      3. Checking if model is already loaded
    
    - Simplified codai/api/images.py:
      - Uses request_model() for model resolution and VRAM management
      - Extracted helper functions: _is_gguf_model(), _load_diffusers_pipeline(),
        _generate_with_diffusers(), _generate_with_sdcpp(), _load_sdcpp_model()
      - Removed duplicated sd.cpp generation code
      - Fixed semaphore scope (all generation now inside semaphore block)
    
    - Simplified codai/api/tts.py:
      - Uses request_model() instead of duplicated VRAM management code
      - Removed duplicate get_cached_model_path() and get_model_cache_dir() wrappers
    
    - Simplified codai/api/transcriptions.py:
      - Uses request_model() instead of duplicated VRAM management code
    
    - Simplified codai/api/text.py:
      - Both /v1/chat/completions and /v1/completions use request_model()
      - Removed duplicated VRAM management blocks
    e004541a
Name
Last commit
Last update
..
__init__.py Loading commit data...
app.py Loading commit data...
images.py Loading commit data...
log.py Loading commit data...
state.py Loading commit data...
text.py Loading commit data...
transcriptions.py Loading commit data...
tts.py Loading commit data...