1. 06 May, 2026 17 commits
  2. 05 May, 2026 4 commits
  3. 03 May, 2026 10 commits
  4. 20 Mar, 2026 9 commits
    • Your Name's avatar
      Fix offload-strategy parameter passing to CUDA backend · bf1d3f52
      Your Name authored
      - Add offload_strategy to kwargs in _load_default_model and _load_model_by_name
      - Fix parameter name: ram -> manual_ram_gb to match backend expectation
      - Also pass load_in_4bit, load_in_8bit, and max_gpu_percent
      bf1d3f52
    • Your Name's avatar
      Add --offload-strategy none to disable CPU offloading and VRAM auto-detection · beded066
      Your Name authored
      - Add 'none' to --offload-strategy choices in cli.py
      - In cuda.py backend:
        - _get_vram_percentages_for_strategy() returns None for 'none' strategy
        - _get_vram_percentages_for_gpu() skips VRAM detection for 'none'
        - load_model() loads directly on GPU without max_memory constraints
      - Add startup status message in main.py for --offload-strategy none
      beded066
    • Your Name's avatar
      Add --no-ram option to maximize VRAM usage · b782a092
      Your Name authored
      - Add --no-ram CLI option to force model loading without CPU RAM spilling
      - Implement --no-ram behavior for:
        - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
        - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
        - Diffusers: force full GPU loading
        - sd.cpp: maximize GPU usage
      - Propagate flag through model manager
      - Add startup banner message
      b782a092
    • Your Name's avatar
      API: validate requested models against CLI-registered models · ef949827
      Your Name authored
      - Add get_all_allowed_identifiers() to MultiModelManager returning all valid
        model identifiers (default model + short name + aliases, audio, tts, image,
        vision models, and custom aliases)
      - Rewrite is_allowed_model() to check against the full allowed set with
        support for prefixed forms and short-name matching
      - Add validation in request_model() that rejects unknown models with an error
        message listing all available models
      - Fix get_model_for_request() to reject loading arbitrary models not in the
        allowed set
      - Update all API endpoints (text, images, tts, transcriptions) to check for
        the error key and return HTTP 404 when a disallowed model is requested
      ef949827
    • Your Name's avatar
      Fix --download-model for non-GGUF HuggingFace models · b0a633c7
      Your Name authored
      - Try GGUF pattern first for HuggingFace model IDs
      - Fall back to snapshot_download for entire repo (transformers/diffusers models)
      - Works for both GGUF models and full HuggingFace repos
      b0a633c7
    • Your Name's avatar
      Try to fix · aacd990a
      Your Name authored
      aacd990a
    • Your Name's avatar
      Simplify --download-model: use cache module directly · fe7a30dc
      Your Name authored
      - Remove auto-detection logic, just use download_model from cache
      - User can specify --download-file-pattern for non-GGUF models
      fe7a30dc
    • Your Name's avatar
      Improve --download-model auto-detection for non-GGUF HF models · 01bdfe14
      Your Name authored
      - Scan HuggingFace repo to detect available file patterns
      - Try multiple patterns (.gguf, .safetensors, .bin, .pt, .pth)
      - Default to .gguf if nothing found
      01bdfe14
    • Your Name's avatar
      Add --download-model CLI argument to download models to cache and exit · a49d1d88
      Your Name authored
      - Add --download-model argument to download a model (URL or HuggingFace ID) to cache
      - Add --download-file-pattern argument to specify file pattern for HF downloads
      - Use download_model from codai.models.cache module
      - Model downloads to appropriate cache and exits without starting server
      a49d1d88