1. 06 May, 2026 23 commits
  2. 05 May, 2026 4 commits
  3. 03 May, 2026 10 commits
  4. 20 Mar, 2026 3 commits
    • Your Name's avatar
      Fix offload-strategy parameter passing to CUDA backend · bf1d3f52
      Your Name authored
      - Add offload_strategy to kwargs in _load_default_model and _load_model_by_name
      - Fix parameter name: ram -> manual_ram_gb to match backend expectation
      - Also pass load_in_4bit, load_in_8bit, and max_gpu_percent
      bf1d3f52
    • Your Name's avatar
      Add --offload-strategy none to disable CPU offloading and VRAM auto-detection · beded066
      Your Name authored
      - Add 'none' to --offload-strategy choices in cli.py
      - In cuda.py backend:
        - _get_vram_percentages_for_strategy() returns None for 'none' strategy
        - _get_vram_percentages_for_gpu() skips VRAM detection for 'none'
        - load_model() loads directly on GPU without max_memory constraints
      - Add startup status message in main.py for --offload-strategy none
      beded066
    • Your Name's avatar
      Add --no-ram option to maximize VRAM usage · b782a092
      Your Name authored
      - Add --no-ram CLI option to force model loading without CPU RAM spilling
      - Implement --no-ram behavior for:
        - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
        - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
        - Diffusers: force full GPU loading
        - sd.cpp: maximize GPU usage
      - Propagate flag through model manager
      - Add startup banner message
      b782a092