codai/api/text.py · bc2b1388eeff0ddeaaa7951d9eb0dcb4b517fa6f · nexlab / coderai

Centralize model resolution and VRAM management in MultiModelManager.request_model() · e004541a

Your Name authored Mar 19, 2026

- Added request_model() method to MultiModelManager that handles:
  1. Alias resolution (image, audio, tts, vision, default, custom aliases)
  2. VRAM management (unloading previous models in ondemand mode)
  3. Checking if model is already loaded

- Simplified codai/api/images.py:
  - Uses request_model() for model resolution and VRAM management
  - Extracted helper functions: _is_gguf_model(), _load_diffusers_pipeline(),
    _generate_with_diffusers(), _generate_with_sdcpp(), _load_sdcpp_model()
  - Removed duplicated sd.cpp generation code
  - Fixed semaphore scope (all generation now inside semaphore block)

- Simplified codai/api/tts.py:
  - Uses request_model() instead of duplicated VRAM management code
  - Removed duplicate get_cached_model_path() and get_model_cache_dir() wrappers

- Simplified codai/api/transcriptions.py:
  - Uses request_model() instead of duplicated VRAM management code

- Simplified codai/api/text.py:
  - Both /v1/chat/completions and /v1/completions use request_model()
  - Removed duplicated VRAM management blocks

e004541a

text.py 83.2 KB

Replace text.py