-
Your Name authored
- Add --no-ram CLI option to force model loading without CPU RAM spilling - Implement --no-ram behavior for: - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True - Diffusers: force full GPU loading - sd.cpp: maximize GPU usage - Propagate flag through model manager - Add startup banner message
b782a092
| Name |
Last commit
|
Last update |
|---|---|---|
| .vscode | ||
| codai | ||
| .gitignore | ||
| LICENSE.md | ||
| README.md | ||
| build.sh | ||
| coder | ||
| coderai | ||
| requirements-nvidia.txt | ||
| requirements-vulkan.txt | ||
| requirements.txt |