• Your Name's avatar
    Add --no-ram option to maximize VRAM usage · b782a092
    Your Name authored
    - Add --no-ram CLI option to force model loading without CPU RAM spilling
    - Implement --no-ram behavior for:
      - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
      - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
      - Diffusers: force full GPU loading
      - sd.cpp: maximize GPU usage
    - Propagate flag through model manager
    - Add startup banner message
    b782a092
cuda.py 37.2 KB