Commits · c15e6ec62402058f2bb759d999ecab90e2a2c612 · nexlab / coderai

15 Mar, 2026 34 commits

Revert: Remove Vulkan disable env vars for llama-cpp-python · c15e6ec6

Your Name authored Mar 15, 2026

User will experiment with Vulkan environment variables from launching script.
Keep only CUDA_VISIBLE_DEVICES setting for now.

c15e6ec6

Fix CUDA backend - use comprehensive Vulkan disable variables · e538d802

Your Name authored Mar 15, 2026

For stable-diffusion-cpp-python:
- GGML_VK_VISIBLE_DEVICES=
- GGML_VULKAN_DEVICE=

For llama-cpp-python (additional):
- VK_ICD_FILENAMES=/dev/null
- VK_DRIVER_FILES=/dev/null
- VK_LOADER_DRIVERS_DISABLE=*
- VK_LOADER_LAYERS_DISABLE=~all~

All variables are restored on cleanup for subsequent Vulkan models.

e538d802

Fix CUDA backend for GGUF models - use VK_ICD_FILENAMES to disable Vulkan · 11bada84

Your Name authored Mar 15, 2026

- Use VK_ICD_FILENAMES=/dev/null to disable Vulkan ICD and force CUDA
- This is the correct variable for llama.cpp to disable Vulkan
- Restore VK_ICD_FILENAMES on cleanup for subsequent Vulkan models

11bada84

Fix CUDA backend for GGUF models - force CUDA via environment variables · f77d34da

Your Name authored Mar 15, 2026

- Set GGML_DISABLE_VULKAN=1 and GGML_VULKAN_DEVICE='' before loading model
- These must be set before llama_cpp import since it reads them at init
- Restore Vulkan settings on cleanup so subsequent Vulkan models work
- Addresses issue where GGUF models ran on CPU instead of CUDA with --backend nvidia

f77d34da

Fix image model preloading - only preload when --loadall or --loadswap is set · c4af709f
Your Name authored Mar 15, 2026

c4af709f
Fix http_request bug in image generation and add http_request parameter to save_image_response · d24c7e18
Your Name authored Mar 15, 2026

d24c7e18

Fix cleanup handling for different model types · 16ce04c5

Your Name authored Mar 15, 2026

- Make cleanup() method handle StableDiffusion and other non-ModelManager objects
- Add try-except in on-demand swap to handle cleanup failures gracefully
- Check if cleanup method exists before calling

16ce04c5

Implement on-demand model swapping for multiple models · 362b8452

Your Name authored Mar 15, 2026

- Add model_backend_types dict to track backend for each model
- Update set_default_model to accept backend_type parameter
- Modify get_model_for_request to swap models on-demand when in ondemand mode
- Unload current model from VRAM and load new model when request arrives for different model
- Respect --backend flag when loading models on-demand
- Only activates when no --loadall or --loadswap flag is specified

362b8452

Revert VK_ICD_FILENAMES - user will set in startup script · ebfa6892
Your Name authored Mar 15, 2026

ebfa6892

Fix Vulkan crash when using --backend nvidia with preloaded image models · 3367d957

Your Name authored Mar 15, 2026

- Add VK_ICD_FILENAMES=/dev/null to disable Vulkan during startup preload
- Previously only set at request time, causing crash during --loadall
- Added check in both startup preload locations (lines ~5254 and ~5811)
- Checks --backend and --image-backend to determine CUDA usage

3367d957

Disable Vulkan via VK_ICD_FILENAMES when using CUDA for sd.cpp · a4f36456

Your Name authored Mar 15, 2026

When --backend nvidia is used, set VK_ICD_FILENAMES=/dev/null
to completely disable Vulkan and force CUDA-only mode for sd.cpp

a4f36456

Use CUDA only if --backend nvidia or --image-backend nvidia is specified · ad852d97
Your Name authored Mar 15, 2026

ad852d97
Add CUDA detection for sd.cpp image generation backend · 27527de3
Your Name authored Mar 15, 2026

27527de3

Show model capabilities when model is loaded · b8e81009

Your Name authored Mar 15, 2026

Print model capabilities (text, image-to-text, image, etc.)
after successful model loading in both NvidiaBackend and VulkanBackend

b8e81009

Rename 'vision' to 'image_to_text' for consistency · 64a9c845
Your Name authored Mar 15, 2026

64a9c845

Add model capability detection · 69fe4af0

Your Name authored Mar 15, 2026

- Add ModelCapabilities dataclass to represent model capabilities
- Add detect_model_capabilities() function to detect:
  - text_generation (LLM)
  - vision (image understanding)
  - image_generation (Stable Diffusion)
  - speech_to_text (whisper)
  - text_to_speech (TTS)
- Use capability detection for better error messages in image generation endpoint

69fe4af0

Show actual backend being used when CUDA is forced for GGUF models · ad4ec2a5
Your Name authored Mar 15, 2026
```
- Show 'cuda (via llama-cpp-python)' when force_cuda is enabled
- Show original backend in GGUF detection message
```
ad4ec2a5
Don't use GGUF text models as fallback for image generation · c3a5417f
Your Name authored Mar 15, 2026
```
GGUF models are for text/LLM and cannot do image generation.
```
c3a5417f
Revert import name - package imports as stable_diffusion_cpp · 5a1b67fe
Your Name authored Mar 15, 2026

5a1b67fe
Add stable-diffusion-cpp-python to requirements for image generation · 273264bf
Your Name authored Mar 15, 2026

273264bf
Add diffusers and safetensors to requirements for image generation · 0f1c2385
Your Name authored Mar 15, 2026
```
diffusers is required for Stable Diffusion image generation
```
0f1c2385
Fix stable-diffusion-cpp-python import name · c5c74197
Your Name authored Mar 15, 2026
```
The package is named 'stable_diffusion_cpp_python', not 'stable_diffusion_cpp'
```
c5c74197

Fallback to main --model for image generation if --image-model not set · b70ae900

Your Name authored Mar 15, 2026

If --image-model is not specified, try to use the main --model as
the image model fallback when requesting 'default' model.

b70ae900

Auto-retry download when cached model file is corrupted · 9c681ed1

Your Name authored Mar 15, 2026

If loading a cached GGUF model fails with corruption indicators
(invalid, corrupt, magic, header), delete the corrupted cache and
re-download the model automatically.

9c681ed1

Fix cached model path lookup to match download format · 102a464b

Your Name authored Mar 15, 2026

The cache filename format was inconsistent:
- get_cached_model_path used: {hash}{ext}
- load_model download used: {hash}_{filename}

This caused cache lookups to always fail. Now both use {hash}_{filename}
format to ensure cached models are properly found.

102a464b

Simplify error message for unsupported image generation models · c72d8384
Your Name authored Mar 15, 2026
```
Change from detailed installation instructions to simple message:
'Model does not support image generation'
```
c72d8384

Fix image model routing: use request model, fall back to default · c792d752

Your Name authored Mar 15, 2026

- If request specifies a model, use that
- If request doesn't specify a model (empty or 'image'), use default
- Legacy 'image:' prefix also falls back to default
- Error handling already exists for when no backend is available

c792d752

Fix debug message to not reference VulkanBackend when using CUDA · fa5c634f

Your Name authored Mar 15, 2026

Change message from 'VulkanBackend will use CUDA backend' to
'GGUF model will use CUDA backend (forced by --backend nvidia)'

fa5c634f

Remove vulkan from available backends when --backend nvidia is used with GGUF models · 88479315

Your Name authored Mar 15, 2026

When user explicitly passes --backend nvidia with a GGUF model,
vulkan is now removed from the available backends list since
llama-cpp-python will use CUDA instead of Vulkan.

88479315

Add note when GGUF model with nvidia backend uses CUDA instead of Vulkan · 273ab8c8

Your Name authored Mar 15, 2026

- When user specifies --backend nvidia with a GGUF model, show a note
  indicating that the vulkan backend will use CUDA
- This clarifies that Vulkan isn't being used in this scenario

273ab8c8

Add 'all' backend option to build.sh for installing all backends at once · a6070221

Your Name authored Mar 15, 2026

- Add 'all' as a valid backend option
- Change default from 'nvidia' to 'all'
- Add comprehensive 'all' backend section that installs:
  - Base requirements
  - PyTorch with CUDA (nvidia backend)
  - llama-cpp-python with CUDA and Vulkan support
  - stable-diffusion-cpp-python with OpenCL
  - Additional requirements
- Detect available hardware (CUDA, Vulkan, OpenCL) and enable accordingly
- Show summary of available backends after installation

a6070221

Force CUDA backend in llama-cpp-python when NVIDIA backend is requested with GGUF models · 1bd92fe1

Your Name authored Mar 15, 2026

- Store original backend before switching to vulkan for GGUF files
- Pass original_backend to VulkanBackend constructor
- Add force_cuda flag that triggers CUDA environment setup
- Set CUDA_VISIBLE_DEVICES when force_cuda is True
- Update success/error messages to reflect actual backend used
- Add debug output for CUDA detection

1bd92fe1

Fix UnboundLocalError for os module in --list-cached-models · d8765ac3

Your Name authored Mar 15, 2026

The local import of os inside the HTTPS block caused Python to treat os as a local variable throughout the main() function.

d8765ac3

Add python-multipart to requirements, GGUF support for CUDA backend · dd4dfff4

Your Name authored Mar 15, 2026

- Add python-multipart to requirements.txt, requirements-nvidia.txt, requirements-vulkan.txt
- Add llama-cpp-python to requirements-nvidia.txt for GGUF support
- When using CUDA/nvidia backend with GGUF file, automatically use llama-cpp-python

dd4dfff4

14 Mar, 2026 6 commits

Add --vision-model, fix --file-path to return URL by default, add HTTPS support · c152ee28

Your Name authored Mar 14, 2026

- Add --vision-model for image/video-to-text models
- When --file-path is set, return URL by default, base64 only if explicitly requested
- Add --https flag with auto-certificate generation
- Add --privkey and --pubkey for custom certificates

c152ee28

Add --image-cfg-scale and auto-detect VRAM for Vulkan · 29d2ed78

Your Name authored Mar 14, 2026

- Add --image-cfg-scale CLI option (default 1.0)
- Add get_cfg_scale() helper that auto-detects VRAM
- If Vulkan and VRAM < 16GB, use cfg_scale=1.0 automatically

29d2ed78

Add per-model backend selection and OpenCL support · bf2a5318

Your Name authored Mar 14, 2026

- Add --image-backend, --audio-backend, --tts-backend CLI args
- Add opencl to backend choices
- Add OpenCL build target in build.sh

bf2a5318

Remove vision model alias - use image only · 4ce1f330
Your Name authored Mar 14, 2026

4ce1f330
Rename --vision-ctx and --vision-offload to --image-ctx and --image-offload · d5f7b07c
Your Name authored Mar 14, 2026

d5f7b07c

Add --debug command line output and --nopreload flag · 2cdd7538

Your Name authored Mar 14, 2026

- When --debug is enabled, show full command line coderai was called with
- Add --nopreload flag to disable model preloading at startup
- When --nopreload is not specified, skip checking for preloaded sd.cpp models (forces load in worker thread to avoid Vulkan context issues)
- Fix image model preloading to respect --nopreload flag

2cdd7538