-
Stefy Lanza (nextime / spora ) authored
When multiple Vulkan-compatible GPUs are present (e.g., NVIDIA + AMD), llama.cpp automatically distributes layers across all GPUs for performance. This can cause unwanted VRAM allocation on the NVIDIA GPU when the user wants to use only the AMD GPU. The new --vulkan-single-gpu flag uses tensor_split to force all model layers onto a single specified GPU device, preventing distribution. - Added --vulkan-single-gpu argument - Added count_vulkan_devices() method to detect GPU count - Modified load_model to build tensor_split array when single_gpu=True - Updated README with documentation for the new flag Example usage: python coderai --model model.gguf --backend vulkan --vulkan-device 1 --vulkan-single-gpu
a62cb69d