Commits · f985ab5ca6f0dcfc42f2c0afcd5e31f4bcb02073 · nexlab / coderai

28 Feb, 2026 13 commits

docs: update README to document Intel GPU support via Vulkan backend · f985ab5c

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Updated main description to include Intel GPUs
- Expanded features section to list Intel as a supported backend
- Updated prerequisites to explain Vulkan works with Intel iGPUs and Arc
- Clarified that build.sh vulkan works for both AMD and Intel
- Added Intel-specific notes and recommendations
- Updated GPU compatibility matrix with Intel hardware
- Added performance expectations for different GPU types

f985ab5c

feat: enhanced request logging middleware to capture detailed 422 validation errors · c8abdaef

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Added detailed request body logging with truncation for large payloads
- Added JSON structure parsing to show message count and keys
- Added comprehensive error response capture for 422 errors
- Added validation error detail parsing (location, message, type)
- Added full traceback logging for exceptions during request processing
- This helps debug client compatibility issues with KiloCode

c8abdaef

Add request logging middleware to debug 422 errors · 46a2eabb
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

46a2eabb

Add extra field tolerance to API request models for better client compatibility · 56496d2f

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Added extra="allow" to ChatCompletionRequest and CompletionRequest
- Added common OpenAI fields: seed, logprobs, top_logprobs, response_format, user, best_of, echo
- This prevents 422 errors when clients send additional fields we don't use

Fixes compatibility issues with KiloCode and other OpenAI-compatible clients

56496d2f

Fix count_vulkan_devices to correctly count GPU devices and exclude CPU devices · 044036e2
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

044036e2

Add --vulkan-single-gpu flag to force Vulkan to use only one GPU · a62cb69d

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

When multiple Vulkan-compatible GPUs are present (e.g., NVIDIA + AMD),
llama.cpp automatically distributes layers across all GPUs for performance.
This can cause unwanted VRAM allocation on the NVIDIA GPU when the user
wants to use only the AMD GPU.

The new --vulkan-single-gpu flag uses tensor_split to force all model
layers onto a single specified GPU device, preventing distribution.

- Added --vulkan-single-gpu argument
- Added count_vulkan_devices() method to detect GPU count
- Modified load_model to build tensor_split array when single_gpu=True
- Updated README with documentation for the new flag

Example usage:
  python coderai --model model.gguf --backend vulkan --vulkan-device 1 --vulkan-single-gpu

a62cb69d

Add documentation for using Vulkan with multiple GPUs (NVIDIA + AMD) · e0f0e99d
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

e0f0e99d

Update llama-cpp-python installation to use --upgrade flag · 70dab7d1

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add --upgrade flag to pip install for llama-cpp-python in build.sh
  This ensures the latest version is installed, supporting newer models
  like Qwen3.5 that may not be supported in older versions
- Add note in README about updating llama-cpp-python for newer models
- Make coderai script executable

70dab7d1

Add Vulkan GPU device selection support · d4d67ebd

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add --vulkan-device argument to select specific GPU (for multi-GPU systems)
- Add --vulkan-list-devices to list available Vulkan GPUs
- Update VulkanBackend to use main_gpu parameter for device selection
- Add list_vulkan_devices() method to show available devices
- Update README with new command-line options and examples

Useful when you have both NVIDIA and AMD GPUs and want to ensure
Vulkan uses the AMD GPU specifically.

d4d67ebd

Make procname optional - commented out in requirements · 3b451669

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Comment out procname in requirements-nvidia.txt
- Comment out procname in requirements-vulkan.txt
- Add note about requiring libproc2-dev for procname

3b451669

Update Vulkan dependencies: add glslc package · 389851fe

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- build.sh: Update package list to include glslc, glslang-tools, glslang-dev
- README.md: Update installation instructions with correct package names
- Add better guidance for finding glslc in non-standard locations

389851fe

Fix Vulkan build: add glslc/shader compiler check · bd5b87b5

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Update build.sh to check for glslc before attempting build
- Update README with correct package names (glslang-tools/glslang)
- Add troubleshooting for missing glslc error

bd5b87b5

Add Vulkan support for AMD GPUs alongside NVIDIA/CUDA · 02fb99fa

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add build.sh script with nvidia/vulkan arguments (default: nvidia)
- Create backend abstraction: ModelBackend base class
- Implement NvidiaBackend using HuggingFace Transformers
- Implement VulkanBackend using llama-cpp-python with GGUF models
- Add separate requirements files for nvidia and vulkan backends
- Add --backend argument with auto/nvidia/vulkan options
- Add Vulkan-specific options: --n-gpu-layers, --n-ctx
- Make procname import optional
- Update README with comprehensive Vulkan usage instructions
- Add Vulkan troubleshooting section
- Add GGUF model recommendations

The application now supports:
- NVIDIA GPUs via PyTorch/Transformers (HuggingFace models)
- AMD GPUs via llama-cpp-python/Vulkan (GGUF models)

02fb99fa

27 Feb, 2026 2 commits

Fix NaN/inf probability tensor error during generation · ae1d0e38

Stefy Lanza (nextime / spora ) authored Feb 27, 2026

- Add InvalidLogitsProcessor to replace NaN and Inf values with finite numbers
- Add _validate_generation_params() to clamp temperature and top_p to valid ranges
- Add try-except blocks with fallback to greedy decoding on numerical errors
- Add error handling in streaming responses to prevent crashes
- Fix temperature=0 handling to use greedy decoding instead of sampling

ae1d0e38

Initial commit: Add CoderAI OpenAI-compatible API server · 087ba9e1

Stefy Lanza (nextime / spora ) authored Feb 27, 2026

- Add main server script with FastAPI and memory-aware model loading
- Add requirements.txt with dependencies and platform-specific PyTorch options
- Add comprehensive README.md with installation, usage, and troubleshooting
- Add LICENSE.md with GPLv3 license

087ba9e1