- 09 Mar, 2026 8 commits
-
-
Your Name authored
faster-whisper doesn't support GGUF format (it's llama.cpp format). Now detects GGUF files by extension and goes directly to whispercpp.
-
Your Name authored
- Add faster_whisper_failed flag to properly track failures - When faster-whisper throws non-ImportError (e.g., GGUF not supported), now falls back to whispercpp instead of failing - Applies to both pre-loading and transcription endpoint
-
Your Name authored
- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors - Improve error messages to mention both options: - Install PyTorch + faster-whisper - Use built-in whispercpp model (tiny/base/small/medium/large) - Fix critical bug: now raises HTTPException instead of returning None
-
Your Name authored
- Recognize built-in model names: tiny, base, small, medium, large-v1, large - Allow pre-loading these models directly without file path
-
Your Name authored
- Add better error detection for 'not a valid preconverted model' errors - Provide clear guidance to users about whispercpp limitations - Suggest installing faster-whisper with PyTorch or using built-in model names - Update both transcription endpoint and pre-loading code
-
Your Name authored
- Update transcription endpoint to try faster-whisper first, then whispercpp - Update pre-loading code to support both backends - Add whispercpp to all requirements files (vulkan, nvidia, default) - Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)
-
Your Name authored
-
Your Name authored
-
- 08 Mar, 2026 27 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
When audio model is in GGUF format, use llama.cpp instead of faster-whisper for pre-loading. This allows using Vulkan backend for audio transcription.
-
Stefy Lanza (nextime / spora ) authored
When only one model type is specified (e.g., only --audio-model with no --model), automatically pre-load it even in on-demand mode. This ensures the model is downloaded and ready for use.
-
Stefy Lanza (nextime / spora ) authored
- Add --loadall flag to pre-load all models at startup - Add --loadswap flag to keep models in RAM, swap active to VRAM - Fix bug where load_mode was used before being defined in audio model section - Remove duplicate load_mode determination code - Improve error message for no main model specified to include TTS
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add --tts-model option for Kokoro TTS models - Add /v1/audio/speech endpoint (OpenAI-compatible) - Add model caching to prevent redundant downloads - Replace MD5 with SHA-256 for cache keys - Move hashlib and pathlib imports to module level
-
Stefy Lanza (nextime / spora ) authored
- --model is now optional if using audio or image models only - Shows helpful error message with examples if no model specified - Prints available models at startup
-
Stefy Lanza (nextime / spora ) authored
- Accept full HTTPS URLs for --model (Vulkan/GGUF models) - Accept full HTTPS URLs for --audio-model (faster-whisper models) - Downloads file to temp directory before loading - Shows download progress percentage
-
Stefy Lanza (nextime / spora ) authored
- Add --debug CLI argument to enable debug mode - When enabled, dumps full request body (no truncation) - When enabled, dumps full generated text (no truncation) - When enabled, dumps extracted tool calls in JSON format - Useful for troubleshooting tool call issues
-
Stefy Lanza (nextime / spora ) authored
- Replace class-based Config with model_config = ConfigDict() in all Pydantic models - Fix Jinja2 crash by ensuring all messages have content key that is never None - Enhanced message cleaning in generate_chat and generate_chat_stream to create copies and ensure content is always a string - Add final safety check in chat_completions endpoint for content handling
-
Stefy Lanza (nextime / spora ) authored
- Add explicit check for missing content key in message dictionaries - Use more aggressive regex patterns in strip_tool_calls_from_content - Handle tool call tags in various formats (JSON, XML, tool names) - Add checks in format_messages, _manual_format_messages, and chat_completions endpoint - Fixes: 'dict object' has no attribute 'content' error in Jinja2 templates
-
Stefy Lanza (nextime / spora ) authored
- Added safety check in generate_chat_stream to replace None content with empty string - Added same check in generate_chat for consistency - This prevents 'dict object has no attribute content' error when processing messages with tool_calls that have no text content
-
Stefy Lanza (nextime / spora ) authored
- Add --audio-model and --image-model CLI arguments - Add --loadall, --audio-ctx, --audio-offload, --vision-ctx, --vision-offload args - Implement MultiModelManager class for dynamic model switching - Add POST /v1/audio/transcriptions endpoint (OpenAI-compatible) - Add POST /v1/images/generations endpoint (OpenAI-compatible) - Update endpoints to use multi_model_manager for model selection - Audio uses faster-whisper for local transcription - Images use Stable Diffusion via diffusers
-
Stefy Lanza (nextime / spora ) authored
- Fix: Handle None content in messages to prevent Jinja2 'dict object has no attribute content' error - Added safety check in chat_completions function - Fixed _manual_format_messages to explicitly check for None - Fixed format_messages in VulkanBackend to ensure content is never None - Fix: Always filter tool call format from output - Changed filter to run unconditionally (not just when tools are present) - Added extra regex patterns for JSON format tool calls like <tool>{...}</tool> - Also fixed: Minor typos in comments (cket ->cket) -
Stefy Lanza (nextime / spora ) authored
- Add seen_signatures set to extract_tool_calls() to prevent duplicates - Add strip_tool_calls_from_content() method to remove <tool>...</tool> tags - Filter tool format from each chunk in real-time during streaming - Simplify post-stream tool call handling since content is already cleaned - Also handle non-streaming responses for tool call content cleanup
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 07 Mar, 2026 3 commits
-
-
Stefy Lanza (nextime / spora ) authored
Detect chat template from model and use appropriate formatting - avoid Jinja errors by using manual formatting when template detection fails
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 05 Mar, 2026 2 commits
-
-
Stefy Lanza (nextime / spora ) authored
Modify _try_load_model() to catch TypeError when quantization arguments are not supported by the model class. When this happens, the method now: 1. Warns the user about unsupported quantization 2. Retries loading the model without quantization arguments 3. Returns the model successfully if loading works This fixes issues with models like Qwen3.5 that don't support bitsandbytes quantization.
-
Stefy Lanza (nextime / spora ) authored
- Wrap generate() with try-except to catch CUDA OOM errors - On OOM: clear CUDA cache, retry with half tokens, return graceful error if still failing - Wrap generate_stream() thread with error handling using shared variable - Yield error messages to client instead of crashing the process - Allows server to continue running after generation OOM
-