• Stefy Lanza (nextime / spora )'s avatar
    Add --vulkan-single-gpu flag to force Vulkan to use only one GPU · a62cb69d
    Stefy Lanza (nextime / spora ) authored
    When multiple Vulkan-compatible GPUs are present (e.g., NVIDIA + AMD),
    llama.cpp automatically distributes layers across all GPUs for performance.
    This can cause unwanted VRAM allocation on the NVIDIA GPU when the user
    wants to use only the AMD GPU.
    
    The new --vulkan-single-gpu flag uses tensor_split to force all model
    layers onto a single specified GPU device, preventing distribution.
    
    - Added --vulkan-single-gpu argument
    - Added count_vulkan_devices() method to detect GPU count
    - Modified load_model to build tensor_split array when single_gpu=True
    - Updated README with documentation for the new flag
    
    Example usage:
      python coderai --model model.gguf --backend vulkan --vulkan-device 1 --vulkan-single-gpu
    a62cb69d
coderai 51.1 KB