- 10 Mar, 2026 21 commits
-
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
When GGUF image model fails to load with llama.cpp, try loading with stable-diffusion-cpp-python (sd.cpp) as fallback.
-
Your Name authored
-
Your Name authored
-
Your Name authored
Tell user to update llama.cpp instead of falling back to diffusers.
-
Your Name authored
If llama.cpp fails to load a GGUF image model (e.g., unsupported architecture like lumina2), try loading via diffusers instead.
-
Your Name authored
GGUF files can have different version bytes after 'GGUF' (e.g., GGUF\x03 for version 3). Changed magic byte check from exact match to prefix check.
-
Your Name authored
Changed condition from 'only if no audio models' to 'always when image_models is configured'. This ensures the image model downloads at startup even when audio models are present.
-
Your Name authored
- Removed redundant 'import os' statements inside functions (lines 4522, 4926, 5005) - Added back missing 'from llama_cpp import Llama' that was accidentally deleted - Global 'import os' at line 12 is now the only one This fixes the UnboundLocalError when running --list-cached-models or other CLI options.
-
Your Name authored
- --list-cached-models: List all cached models with sizes - --remove-all-models: Remove all cached models - --remove-model <modelid>: Remove specific model by name/hash (partial match)
-
Your Name authored
- Check if downloaded file is valid GGUF (magic bytes = 'GGUF') - If not valid, show clear error that URL is wrong (returns HTML instead) - Explain that URL must be direct download link ending in .gguf
-
Your Name authored
-
Your Name authored
- Enable verbose=True in llama.cpp to see actual error - Print GGUF model file size for debugging - Add try/except with traceback to see detailed errors
-
Your Name authored
- Check if model is URL before any processing - Use original model name with query params for URL download - Strip query params only for HuggingFace repo ID parsing - Added more debug output to trace issues
-
Your Name authored
- Strip query parameters from model name before processing - Handle URLs with ?download=true or other query params
-
Your Name authored
- Detect if image model is GGUF (ends with .gguf or contains 'gguf') - If GGUF, load using llama.cpp (same as text Vulkan models) - If diffusers model, load using Stable Diffusion pipeline - Fixed both locations where image model preloading happens - Now supports both GGUF and diffusers image generation models
-
Your Name authored
- Fixed bug where image model wasn't actually being loaded when --loadall was specified - The code only printed messages but never loaded the diffusers pipeline - Now actually loads the Stable Diffusion pipeline using diffusers library - Tries StableDiffusionXLPipeline first, falls back to generic DiffusionPipeline - Moves to GPU if CUDA available, enables attention slicing for memory efficiency - Also fixes second location where image model is the only configured model - Debug command line output was already implemented
-
Your Name authored
- Fixed undefined variable bug where model_name wasn't defined in scope - Fixed duplicate model loading when using --loadall/--loadswap with multiple models - First model is now only loaded once (skipped in loop if already loaded) - Loadall mode now properly preloads all models in VRAM respecting offload strategy - Loadswap mode properly loads additional models to RAM - Ondemand mode doesn't reload first model Feature 1: --debug now shows full command line as first output Feature 2: --loadall with multiple models now preloads all in VRAM
-
- 09 Mar, 2026 19 commits
-
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Move whisper-server check before audio_model check - Now whisper-server will be used if available, regardless of audio_model setting - Also update multi_model_manager.audio_models with cached path
-
Your Name authored
- WhisperServerManager.start() now returns actual model path (useful for URL -> cached path) - Update audio_models[0] with cached path after downloading - Store actual_model_path in current_model instead of original URL
-
Your Name authored
- Changed default port from 8081 to 8744 (less common) - Check if port is available before using, auto-find available port if needed - Download URL models before starting whisper-server (use model cache)
-
Your Name authored
- Add WhisperServerManager class to manage whisper-server subprocess - Add --whisper-server argument to specify whisper-server binary path - Add --whisper-server-port argument for port configuration (default 8081) - Modify audio transcription endpoint to proxy to whisper-server - Add cleanup on shutdown to stop whisper-server - Model can stay loaded in VRAM as long as the server runs
-
Your Name authored
Fix remaining occurrences of model_name (singular) being used instead of model_names (list) in main function.
-
Your Name authored
Fixed bug where model_name (singular) was used instead of model_names (list) in several places, causing UnboundLocalError when running without --model.
-
Your Name authored
- CLI (coder): Add --alias argument to create model aliases - Config: Add model_aliases dict and resolve_model() method - Server (coderai): Add server-side alias support with --model-alias - Aliases are resolved in both client and server when making API calls - Aliases appear in /v1/models endpoint - Aliases are persisted in config file
-
Your Name authored
- Add QueueManager class to track waiting requests - Send 'waiting for model...' frames with time counter at regular intervals - Send 'Model starting' frame when model begins processing - Add x_queue_info field to streaming response frames for queue status - Track queue position and wait time for each client
-
Your Name authored
- Add support for multiple --audio-model arguments (action='append') - Add support for multiple --image-model arguments (action='append') - Add 'audio' alias pointing to first audio model - Add 'vision'/'image' aliases pointing to first image model - Update MultiModelManager to store audio_models and image_models as lists - Add audio_model and image_model properties for accessing first model - Update get_model_for_request to handle aliases - Update list_models to show all models and aliases - Fix remaining references in main function to use list-based variables
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Changed --model to -m - Changed --output to -otxt (output as text) - Changed --device to -dev - Changed --file to -f for input audio
-
Your Name authored
-
Your Name authored
The args variable was not accessible in the create_transcription function, causing a NameError when using --whisper-cpp CLI option. This fix adds global_args to store the parsed arguments for access in endpoint functions.
-
Your Name authored
-
Your Name authored
-
Your Name authored
-