- 10 Mar, 2026 3 commits
-
-
Your Name authored
- Detect if image model is GGUF (ends with .gguf or contains 'gguf') - If GGUF, load using llama.cpp (same as text Vulkan models) - If diffusers model, load using Stable Diffusion pipeline - Fixed both locations where image model preloading happens - Now supports both GGUF and diffusers image generation models
-
Your Name authored
- Fixed bug where image model wasn't actually being loaded when --loadall was specified - The code only printed messages but never loaded the diffusers pipeline - Now actually loads the Stable Diffusion pipeline using diffusers library - Tries StableDiffusionXLPipeline first, falls back to generic DiffusionPipeline - Moves to GPU if CUDA available, enables attention slicing for memory efficiency - Also fixes second location where image model is the only configured model - Debug command line output was already implemented
-
Your Name authored
- Fixed undefined variable bug where model_name wasn't defined in scope - Fixed duplicate model loading when using --loadall/--loadswap with multiple models - First model is now only loaded once (skipped in loop if already loaded) - Loadall mode now properly preloads all models in VRAM respecting offload strategy - Loadswap mode properly loads additional models to RAM - Ondemand mode doesn't reload first model Feature 1: --debug now shows full command line as first output Feature 2: --loadall with multiple models now preloads all in VRAM
-
- 09 Mar, 2026 30 commits
-
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Move whisper-server check before audio_model check - Now whisper-server will be used if available, regardless of audio_model setting - Also update multi_model_manager.audio_models with cached path
-
Your Name authored
- WhisperServerManager.start() now returns actual model path (useful for URL -> cached path) - Update audio_models[0] with cached path after downloading - Store actual_model_path in current_model instead of original URL
-
Your Name authored
- Changed default port from 8081 to 8744 (less common) - Check if port is available before using, auto-find available port if needed - Download URL models before starting whisper-server (use model cache)
-
Your Name authored
- Add WhisperServerManager class to manage whisper-server subprocess - Add --whisper-server argument to specify whisper-server binary path - Add --whisper-server-port argument for port configuration (default 8081) - Modify audio transcription endpoint to proxy to whisper-server - Add cleanup on shutdown to stop whisper-server - Model can stay loaded in VRAM as long as the server runs
-
Your Name authored
Fix remaining occurrences of model_name (singular) being used instead of model_names (list) in main function.
-
Your Name authored
Fixed bug where model_name (singular) was used instead of model_names (list) in several places, causing UnboundLocalError when running without --model.
-
Your Name authored
- CLI (coder): Add --alias argument to create model aliases - Config: Add model_aliases dict and resolve_model() method - Server (coderai): Add server-side alias support with --model-alias - Aliases are resolved in both client and server when making API calls - Aliases appear in /v1/models endpoint - Aliases are persisted in config file
-
Your Name authored
- Add QueueManager class to track waiting requests - Send 'waiting for model...' frames with time counter at regular intervals - Send 'Model starting' frame when model begins processing - Add x_queue_info field to streaming response frames for queue status - Track queue position and wait time for each client
-
Your Name authored
- Add support for multiple --audio-model arguments (action='append') - Add support for multiple --image-model arguments (action='append') - Add 'audio' alias pointing to first audio model - Add 'vision'/'image' aliases pointing to first image model - Update MultiModelManager to store audio_models and image_models as lists - Add audio_model and image_model properties for accessing first model - Update get_model_for_request to handle aliases - Update list_models to show all models and aliases - Fix remaining references in main function to use list-based variables
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Changed --model to -m - Changed --output to -otxt (output as text) - Changed --device to -dev - Changed --file to -f for input audio
-
Your Name authored
-
Your Name authored
The args variable was not accessible in the create_transcription function, causing a NameError when using --whisper-cpp CLI option. This fix adds global_args to store the parsed arguments for access in endpoint functions.
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Modified build.sh to build whispercpp with Vulkan support - Added --audio-vulkan-device argument to specify GPU device for Whisper - Added Vulkan detection and logging for Whisper transcription - Set GGML_VULKAN_DEVICE environment variable for GPU selection
-
Your Name authored
-
Your Name authored
-
Your Name authored
faster-whisper doesn't support GGUF format (it's llama.cpp format). Now detects GGUF files by extension and goes directly to whispercpp.
-
Your Name authored
- Add faster_whisper_failed flag to properly track failures - When faster-whisper throws non-ImportError (e.g., GGUF not supported), now falls back to whispercpp instead of failing - Applies to both pre-loading and transcription endpoint
-
Your Name authored
- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors - Improve error messages to mention both options: - Install PyTorch + faster-whisper - Use built-in whispercpp model (tiny/base/small/medium/large) - Fix critical bug: now raises HTTPException instead of returning None
-
Your Name authored
- Recognize built-in model names: tiny, base, small, medium, large-v1, large - Allow pre-loading these models directly without file path
-
Your Name authored
- Add better error detection for 'not a valid preconverted model' errors - Provide clear guidance to users about whispercpp limitations - Suggest installing faster-whisper with PyTorch or using built-in model names - Update both transcription endpoint and pre-loading code
-
Your Name authored
- Update transcription endpoint to try faster-whisper first, then whispercpp - Update pre-loading code to support both backends - Add whispercpp to all requirements files (vulkan, nvidia, default) - Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)
-
Your Name authored
-
Your Name authored
-
- 08 Mar, 2026 7 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-