• Your Name's avatar
    Add --no-ram option to maximize VRAM usage · b782a092
    Your Name authored
    - Add --no-ram CLI option to force model loading without CPU RAM spilling
    - Implement --no-ram behavior for:
      - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
      - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
      - Diffusers: force full GPU loading
      - sd.cpp: maximize GPU usage
    - Propagate flag through model manager
    - Add startup banner message
    b782a092
Name
Last commit
Last update
..
api Loading commit data...
backends Loading commit data...
models Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
main.py Loading commit data...