• Your Name's avatar
    Centralize model resolution and VRAM management in MultiModelManager.request_model() · e004541a
    Your Name authored
    - Added request_model() method to MultiModelManager that handles:
      1. Alias resolution (image, audio, tts, vision, default, custom aliases)
      2. VRAM management (unloading previous models in ondemand mode)
      3. Checking if model is already loaded
    
    - Simplified codai/api/images.py:
      - Uses request_model() for model resolution and VRAM management
      - Extracted helper functions: _is_gguf_model(), _load_diffusers_pipeline(),
        _generate_with_diffusers(), _generate_with_sdcpp(), _load_sdcpp_model()
      - Removed duplicated sd.cpp generation code
      - Fixed semaphore scope (all generation now inside semaphore block)
    
    - Simplified codai/api/tts.py:
      - Uses request_model() instead of duplicated VRAM management code
      - Removed duplicate get_cached_model_path() and get_model_cache_dir() wrappers
    
    - Simplified codai/api/transcriptions.py:
      - Uses request_model() instead of duplicated VRAM management code
    
    - Simplified codai/api/text.py:
      - Both /v1/chat/completions and /v1/completions use request_model()
      - Removed duplicated VRAM management blocks
    e004541a
text.py 83.2 KB