codai/backends/cuda.py · b782a092161a1af18b4097e269422314f6a7e938 · nexlab / coderai

Add --no-ram option to maximize VRAM usage · b782a092

Your Name authored Mar 20, 2026

- Add --no-ram CLI option to force model loading without CPU RAM spilling
- Implement --no-ram behavior for:
  - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
  - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
  - Diffusers: force full GPU loading
  - sd.cpp: maximize GPU usage
- Propagate flag through model manager
- Add startup banner message

b782a092

cuda.py 37.2 KB

Replace cuda.py