- 15 Mar, 2026 34 commits
-
-
Your Name authored
User will experiment with Vulkan environment variables from launching script. Keep only CUDA_VISIBLE_DEVICES setting for now.
-
Your Name authored
For stable-diffusion-cpp-python: - GGML_VK_VISIBLE_DEVICES= - GGML_VULKAN_DEVICE= For llama-cpp-python (additional): - VK_ICD_FILENAMES=/dev/null - VK_DRIVER_FILES=/dev/null - VK_LOADER_DRIVERS_DISABLE=* - VK_LOADER_LAYERS_DISABLE=~all~ All variables are restored on cleanup for subsequent Vulkan models.
-
Your Name authored
- Use VK_ICD_FILENAMES=/dev/null to disable Vulkan ICD and force CUDA - This is the correct variable for llama.cpp to disable Vulkan - Restore VK_ICD_FILENAMES on cleanup for subsequent Vulkan models
-
Your Name authored
- Set GGML_DISABLE_VULKAN=1 and GGML_VULKAN_DEVICE='' before loading model - These must be set before llama_cpp import since it reads them at init - Restore Vulkan settings on cleanup so subsequent Vulkan models work - Addresses issue where GGUF models ran on CPU instead of CUDA with --backend nvidia
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Make cleanup() method handle StableDiffusion and other non-ModelManager objects - Add try-except in on-demand swap to handle cleanup failures gracefully - Check if cleanup method exists before calling
-
Your Name authored
- Add model_backend_types dict to track backend for each model - Update set_default_model to accept backend_type parameter - Modify get_model_for_request to swap models on-demand when in ondemand mode - Unload current model from VRAM and load new model when request arrives for different model - Respect --backend flag when loading models on-demand - Only activates when no --loadall or --loadswap flag is specified
-
Your Name authored
-
Your Name authored
- Add VK_ICD_FILENAMES=/dev/null to disable Vulkan during startup preload - Previously only set at request time, causing crash during --loadall - Added check in both startup preload locations (lines ~5254 and ~5811) - Checks --backend and --image-backend to determine CUDA usage
-
Your Name authored
When --backend nvidia is used, set VK_ICD_FILENAMES=/dev/null to completely disable Vulkan and force CUDA-only mode for sd.cpp
-
Your Name authored
-
Your Name authored
-
Your Name authored
Print model capabilities (text, image-to-text, image, etc.) after successful model loading in both NvidiaBackend and VulkanBackend
-
Your Name authored
-
Your Name authored
- Add ModelCapabilities dataclass to represent model capabilities - Add detect_model_capabilities() function to detect: - text_generation (LLM) - vision (image understanding) - image_generation (Stable Diffusion) - speech_to_text (whisper) - text_to_speech (TTS) - Use capability detection for better error messages in image generation endpoint
-
Your Name authored
- Show 'cuda (via llama-cpp-python)' when force_cuda is enabled - Show original backend in GGUF detection message
-
Your Name authored
GGUF models are for text/LLM and cannot do image generation.
-
Your Name authored
-
Your Name authored
-
Your Name authored
diffusers is required for Stable Diffusion image generation
-
Your Name authored
The package is named 'stable_diffusion_cpp_python', not 'stable_diffusion_cpp'
-
Your Name authored
If --image-model is not specified, try to use the main --model as the image model fallback when requesting 'default' model.
-
Your Name authored
If loading a cached GGUF model fails with corruption indicators (invalid, corrupt, magic, header), delete the corrupted cache and re-download the model automatically.
-
Your Name authored
The cache filename format was inconsistent: - get_cached_model_path used: {hash}{ext} - load_model download used: {hash}_{filename} This caused cache lookups to always fail. Now both use {hash}_{filename} format to ensure cached models are properly found. -
Your Name authored
Change from detailed installation instructions to simple message: 'Model does not support image generation'
-
Your Name authored
- If request specifies a model, use that - If request doesn't specify a model (empty or 'image'), use default - Legacy 'image:' prefix also falls back to default - Error handling already exists for when no backend is available
-
Your Name authored
Change message from 'VulkanBackend will use CUDA backend' to 'GGUF model will use CUDA backend (forced by --backend nvidia)'
-
Your Name authored
When user explicitly passes --backend nvidia with a GGUF model, vulkan is now removed from the available backends list since llama-cpp-python will use CUDA instead of Vulkan.
-
Your Name authored
- When user specifies --backend nvidia with a GGUF model, show a note indicating that the vulkan backend will use CUDA - This clarifies that Vulkan isn't being used in this scenario
-
Your Name authored
- Add 'all' as a valid backend option - Change default from 'nvidia' to 'all' - Add comprehensive 'all' backend section that installs: - Base requirements - PyTorch with CUDA (nvidia backend) - llama-cpp-python with CUDA and Vulkan support - stable-diffusion-cpp-python with OpenCL - Additional requirements - Detect available hardware (CUDA, Vulkan, OpenCL) and enable accordingly - Show summary of available backends after installation
-
Your Name authored
- Store original backend before switching to vulkan for GGUF files - Pass original_backend to VulkanBackend constructor - Add force_cuda flag that triggers CUDA environment setup - Set CUDA_VISIBLE_DEVICES when force_cuda is True - Update success/error messages to reflect actual backend used - Add debug output for CUDA detection
-
Your Name authored
The local import of os inside the HTTPS block caused Python to treat os as a local variable throughout the main() function.
-
Your Name authored
- Add python-multipart to requirements.txt, requirements-nvidia.txt, requirements-vulkan.txt - Add llama-cpp-python to requirements-nvidia.txt for GGUF support - When using CUDA/nvidia backend with GGUF file, automatically use llama-cpp-python
-
- 14 Mar, 2026 6 commits
-
-
Your Name authored
- Add --vision-model for image/video-to-text models - When --file-path is set, return URL by default, base64 only if explicitly requested - Add --https flag with auto-certificate generation - Add --privkey and --pubkey for custom certificates
-
Your Name authored
- Add --image-cfg-scale CLI option (default 1.0) - Add get_cfg_scale() helper that auto-detects VRAM - If Vulkan and VRAM < 16GB, use cfg_scale=1.0 automatically
-
Your Name authored
- Add --image-backend, --audio-backend, --tts-backend CLI args - Add opencl to backend choices - Add OpenCL build target in build.sh
-
Your Name authored
-
Your Name authored
-
Your Name authored
- When --debug is enabled, show full command line coderai was called with - Add --nopreload flag to disable model preloading at startup - When --nopreload is not specified, skip checking for preloaded sd.cpp models (forces load in worker thread to avoid Vulkan context issues) - Fix image model preloading to respect --nopreload flag
-