coderai · a62cb69dd33aa42cec5918ccf2c8853cee96bb86 · nexlab / coderai

Add --vulkan-single-gpu flag to force Vulkan to use only one GPU · a62cb69d

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

When multiple Vulkan-compatible GPUs are present (e.g., NVIDIA + AMD),
llama.cpp automatically distributes layers across all GPUs for performance.
This can cause unwanted VRAM allocation on the NVIDIA GPU when the user
wants to use only the AMD GPU.

The new --vulkan-single-gpu flag uses tensor_split to force all model
layers onto a single specified GPU device, preventing distribution.

- Added --vulkan-single-gpu argument
- Added count_vulkan_devices() method to detect GPU count
- Modified load_model to build tensor_split array when single_gpu=True
- Updated README with documentation for the new flag

Example usage:
  python coderai --model model.gguf --backend vulkan --vulkan-device 1 --vulkan-single-gpu

a62cb69d

coderai 51.1 KB

Replace coderai