• Your Name's avatar
    Add --no-ram option to maximize VRAM usage · b782a092
    Your Name authored
    - Add --no-ram CLI option to force model loading without CPU RAM spilling
    - Implement --no-ram behavior for:
      - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
      - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
      - Diffusers: force full GPU loading
      - sd.cpp: maximize GPU usage
    - Propagate flag through model manager
    - Add startup banner message
    b782a092
Name
Last commit
Last update
..
cache Loading commit data...
__init__.py Loading commit data...
capabilities.py Loading commit data...
grammar.py Loading commit data...
manager.py Loading commit data...
parser.py Loading commit data...
templates.py Loading commit data...
tool_call_grammar.gbnf Loading commit data...
utils.py Loading commit data...