-
Your Name authored
- Add --no-ram CLI option to force model loading without CPU RAM spilling - Implement --no-ram behavior for: - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True - Diffusers: force full GPU loading - sd.cpp: maximize GPU usage - Propagate flag through model manager - Add startup banner message
b782a092
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| cache | ||
| __init__.py | ||
| capabilities.py | ||
| grammar.py | ||
| manager.py | ||
| parser.py | ||
| templates.py | ||
| tool_call_grammar.gbnf | ||
| utils.py |