- 09 Mar, 2026 15 commits
-
-
Your Name authored
The args variable was not accessible in the create_transcription function, causing a NameError when using --whisper-cpp CLI option. This fix adds global_args to store the parsed arguments for access in endpoint functions.
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Modified build.sh to build whispercpp with Vulkan support - Added --audio-vulkan-device argument to specify GPU device for Whisper - Added Vulkan detection and logging for Whisper transcription - Set GGML_VULKAN_DEVICE environment variable for GPU selection
-
Your Name authored
-
Your Name authored
-
Your Name authored
faster-whisper doesn't support GGUF format (it's llama.cpp format). Now detects GGUF files by extension and goes directly to whispercpp.
-
Your Name authored
- Add faster_whisper_failed flag to properly track failures - When faster-whisper throws non-ImportError (e.g., GGUF not supported), now falls back to whispercpp instead of failing - Applies to both pre-loading and transcription endpoint
-
Your Name authored
- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors - Improve error messages to mention both options: - Install PyTorch + faster-whisper - Use built-in whispercpp model (tiny/base/small/medium/large) - Fix critical bug: now raises HTTPException instead of returning None
-
Your Name authored
- Recognize built-in model names: tiny, base, small, medium, large-v1, large - Allow pre-loading these models directly without file path
-
Your Name authored
- Add better error detection for 'not a valid preconverted model' errors - Provide clear guidance to users about whispercpp limitations - Suggest installing faster-whisper with PyTorch or using built-in model names - Update both transcription endpoint and pre-loading code
-
Your Name authored
- Update transcription endpoint to try faster-whisper first, then whispercpp - Update pre-loading code to support both backends - Add whispercpp to all requirements files (vulkan, nvidia, default) - Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)
-
Your Name authored
-
Your Name authored
-
- 08 Mar, 2026 25 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
When audio model is in GGUF format, use llama.cpp instead of faster-whisper for pre-loading. This allows using Vulkan backend for audio transcription.
-
Stefy Lanza (nextime / spora ) authored
When only one model type is specified (e.g., only --audio-model with no --model), automatically pre-load it even in on-demand mode. This ensures the model is downloaded and ready for use.
-
Stefy Lanza (nextime / spora ) authored
- Add --loadall flag to pre-load all models at startup - Add --loadswap flag to keep models in RAM, swap active to VRAM - Fix bug where load_mode was used before being defined in audio model section - Remove duplicate load_mode determination code - Improve error message for no main model specified to include TTS
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add --tts-model option for Kokoro TTS models - Add /v1/audio/speech endpoint (OpenAI-compatible) - Add model caching to prevent redundant downloads - Replace MD5 with SHA-256 for cache keys - Move hashlib and pathlib imports to module level
-
Stefy Lanza (nextime / spora ) authored
- --model is now optional if using audio or image models only - Shows helpful error message with examples if no model specified - Prints available models at startup
-
Stefy Lanza (nextime / spora ) authored
- Accept full HTTPS URLs for --model (Vulkan/GGUF models) - Accept full HTTPS URLs for --audio-model (faster-whisper models) - Downloads file to temp directory before loading - Shows download progress percentage
-
Stefy Lanza (nextime / spora ) authored
- Add --debug CLI argument to enable debug mode - When enabled, dumps full request body (no truncation) - When enabled, dumps full generated text (no truncation) - When enabled, dumps extracted tool calls in JSON format - Useful for troubleshooting tool call issues
-
Stefy Lanza (nextime / spora ) authored
- Replace class-based Config with model_config = ConfigDict() in all Pydantic models - Fix Jinja2 crash by ensuring all messages have content key that is never None - Enhanced message cleaning in generate_chat and generate_chat_stream to create copies and ensure content is always a string - Add final safety check in chat_completions endpoint for content handling
-
Stefy Lanza (nextime / spora ) authored
- Add explicit check for missing content key in message dictionaries - Use more aggressive regex patterns in strip_tool_calls_from_content - Handle tool call tags in various formats (JSON, XML, tool names) - Add checks in format_messages, _manual_format_messages, and chat_completions endpoint - Fixes: 'dict object' has no attribute 'content' error in Jinja2 templates
-
Stefy Lanza (nextime / spora ) authored
- Added safety check in generate_chat_stream to replace None content with empty string - Added same check in generate_chat for consistency - This prevents 'dict object has no attribute content' error when processing messages with tool_calls that have no text content
-
Stefy Lanza (nextime / spora ) authored
- Add --audio-model and --image-model CLI arguments - Add --loadall, --audio-ctx, --audio-offload, --vision-ctx, --vision-offload args - Implement MultiModelManager class for dynamic model switching - Add POST /v1/audio/transcriptions endpoint (OpenAI-compatible) - Add POST /v1/images/generations endpoint (OpenAI-compatible) - Update endpoints to use multi_model_manager for model selection - Audio uses faster-whisper for local transcription - Images use Stable Diffusion via diffusers
-
Stefy Lanza (nextime / spora ) authored
- Fix: Handle None content in messages to prevent Jinja2 'dict object has no attribute content' error - Added safety check in chat_completions function - Fixed _manual_format_messages to explicitly check for None - Fixed format_messages in VulkanBackend to ensure content is never None - Fix: Always filter tool call format from output - Changed filter to run unconditionally (not just when tools are present) - Added extra regex patterns for JSON format tool calls like <tool>{...}</tool> - Also fixed: Minor typos in comments (cket ->cket) -
Stefy Lanza (nextime / spora ) authored
- Add seen_signatures set to extract_tool_calls() to prevent duplicates - Add strip_tool_calls_from_content() method to remove <tool>...</tool> tags - Filter tool format from each chunk in real-time during streaming - Simplify post-stream tool call handling since content is already cleaned - Also handle non-streaming responses for tool call content cleanup
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-