Commits · a388b95e7671697e2c200c55729f17845b205a17 · nexlab / coderai

09 Mar, 2026 15 commits

Fix: Make args accessible in FastAPI transcription endpoint · a388b95e

Your Name authored Mar 09, 2026

The args variable was not accessible in the create_transcription function,
causing a NameError when using --whisper-cpp CLI option. This fix adds
global_args to store the parsed arguments for access in endpoint functions.

a388b95e

Add --whisper-cpp option to use whisper.cpp CLI directly · 4eaa850f
Your Name authored Mar 09, 2026

4eaa850f
Add debug output for whispercpp import errors · 4c24c7b9
Your Name authored Mar 09, 2026

4c24c7b9
Fix UnboundLocalError for model_path in startup code · 966fad45
Your Name authored Mar 09, 2026

966fad45

Add Whisper GPU support via Vulkan backend · 803f2bb8

Your Name authored Mar 09, 2026

- Modified build.sh to build whispercpp with Vulkan support
- Added --audio-vulkan-device argument to specify GPU device for Whisper
- Added Vulkan detection and logging for Whisper transcription
- Set GGML_VULKAN_DEVICE environment variable for GPU selection

803f2bb8

Force CPU mode for faster-whisper (CUDA not compatible with Vulkan) · d23c2148
Your Name authored Mar 09, 2026

d23c2148
Add warning when faster-whisper runs on CPU (no CUDA) · 1dafc558
Your Name authored Mar 09, 2026

1dafc558

Fix: Skip faster-whisper for GGUF files · c8f70fe4

Your Name authored Mar 09, 2026

faster-whisper doesn't support GGUF format (it's llama.cpp format).
Now detects GGUF files by extension and goes directly to whispercpp.

c8f70fe4

Fix: Fall back to whispercpp when faster-whisper fails to load · 11a0fd46

Your Name authored Mar 09, 2026

- Add faster_whisper_failed flag to properly track failures
- When faster-whisper throws non-ImportError (e.g., GGUF not supported),
  now falls back to whispercpp instead of failing
- Applies to both pre-loading and transcription endpoint

11a0fd46

Fix error handling for audio transcription when libraries unavailable · fee8a9dd

Your Name authored Mar 09, 2026

- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors
- Improve error messages to mention both options:
  - Install PyTorch + faster-whisper
  - Use built-in whispercpp model (tiny/base/small/medium/large)
- Fix critical bug: now raises HTTPException instead of returning None

fee8a9dd

Fix pre-loading to recognize built-in whispercpp model names · 2186b190

Your Name authored Mar 09, 2026

- Recognize built-in model names: tiny, base, small, medium, large-v1, large
- Allow pre-loading these models directly without file path

2186b190

Improve whispercpp error handling for HuggingFace GGUF files · f5142c1b

Your Name authored Mar 09, 2026

- Add better error detection for 'not a valid preconverted model' errors
- Provide clear guidance to users about whispercpp limitations
- Suggest installing faster-whisper with PyTorch or using built-in model names
- Update both transcription endpoint and pre-loading code

f5142c1b

Add whispercpp support for audio transcription without PyTorch · 44941ac6

Your Name authored Mar 09, 2026

- Update transcription endpoint to try faster-whisper first, then whispercpp
- Update pre-loading code to support both backends
- Add whispercpp to all requirements files (vulkan, nvidia, default)
- Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)

44941ac6

Add faster-whisper to requirements for audio transcription · 6ef7a2dd
Your Name authored Mar 09, 2026

6ef7a2dd
Add test files to .gitignore · 606747de
Your Name authored Mar 09, 2026

606747de

08 Mar, 2026 25 commits

Suppress unraisable LlamaModel.__del__ errors using sys.unraisablehook · f28c6185
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

f28c6185
Use bare except to suppress llama.cpp __del__ errors · 6bd4dc91
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

6bd4dc91
Suppress llama.cpp __del__ errors during pre-load · f9739fe3
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

f9739fe3
Remove traceback print for optional audio pre-load · ba8e4792
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

ba8e4792
Add clearer message when audio model loads on-demand · e554baef
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

e554baef
Try faster-whisper first for audio pre-load, fall back to GGUF · bae50d66
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

bae50d66
Use download_model helper for audio pre-load with progress · 4f6d64d4
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

4f6d64d4
Add download_model helper with progress: size, total, speed · b622fe9e
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

b622fe9e
Add better error handling for GGUF audio model loading · 23fe4347
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

23fe4347

Add GGUF audio model support with llama.cpp (Vulkan) · 3daca858

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

When audio model is in GGUF format, use llama.cpp instead of faster-whisper
for pre-loading. This allows using Vulkan backend for audio transcription.

3daca858

Auto-pre-load single model when only one model type is configured · 833a4ff3

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

When only one model type is specified (e.g., only --audio-model with no
--model), automatically pre-load it even in on-demand mode. This ensures
the model is downloaded and ready for use.

833a4ff3

Add model pre-loading support (--loadall, --loadswap) and fix duplicate code bug · 6310e8b1

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add --loadall flag to pre-load all models at startup
- Add --loadswap flag to keep models in RAM, swap active to VRAM
- Fix bug where load_mode was used before being defined in audio model section
- Remove duplicate load_mode determination code
- Improve error message for no main model specified to include TTS

6310e8b1

Add audio model pre-loading at startup when --loadall is used · 7651468e
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

7651468e

Add TTS support with kokoro-python and model caching improvements · ebd4acbb

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add --tts-model option for Kokoro TTS models
- Add /v1/audio/speech endpoint (OpenAI-compatible)
- Add model caching to prevent redundant downloads
- Replace MD5 with SHA-256 for cache keys
- Move hashlib and pathlib imports to module level

ebd4acbb

Make --model optional when --audio-model or --image-model are specified · 10dc9f5c

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- --model is now optional if using audio or image models only
- Shows helpful error message with examples if no model specified
- Prints available models at startup

10dc9f5c

Support full URLs for model paths · 3ae1869a

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Accept full HTTPS URLs for --model (Vulkan/GGUF models)
- Accept full HTTPS URLs for --audio-model (faster-whisper models)
- Downloads file to temp directory before loading
- Shows download progress percentage

3ae1869a

Add --debug flag to dump full requests and replies · c12c55d6

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add --debug CLI argument to enable debug mode
- When enabled, dumps full request body (no truncation)
- When enabled, dumps full generated text (no truncation)
- When enabled, dumps extracted tool calls in JSON format
- Useful for troubleshooting tool call issues

c12c55d6

Fix Pydantic deprecation warnings and Jinja2 crash · 910238ba

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Replace class-based Config with model_config = ConfigDict() in all Pydantic models
- Fix Jinja2 crash by ensuring all messages have content key that is never None
- Enhanced message cleaning in generate_chat and generate_chat_stream to create copies and ensure content is always a string
- Add final safety check in chat_completions endpoint for content handling

910238ba

Fix Jinja2 crash: ensure content key always exists in messages · f8618ce8

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add explicit check for missing content key in message dictionaries
- Use more aggressive regex patterns in strip_tool_calls_from_content
- Handle tool call tags in various formats (JSON, XML, tool names)
- Add checks in format_messages, _manual_format_messages, and chat_completions endpoint
- Fixes: 'dict object' has no attribute 'content' error in Jinja2 templates

f8618ce8

Fix Jinja2 error: ensure no message has None content in VulkanBackend · 4296b440

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Added safety check in generate_chat_stream to replace None content with empty string
- Added same check in generate_chat for consistency
- This prevents 'dict object has no attribute content' error when
  processing messages with tool_calls that have no text content

4296b440

feat: Add multi-model support for audio transcription and image generation · 1cdfe825

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add --audio-model and --image-model CLI arguments
- Add --loadall, --audio-ctx, --audio-offload, --vision-ctx, --vision-offload args
- Implement MultiModelManager class for dynamic model switching
- Add POST /v1/audio/transcriptions endpoint (OpenAI-compatible)
- Add POST /v1/images/generations endpoint (OpenAI-compatible)
- Update endpoints to use multi_model_manager for model selection
- Audio uses faster-whisper for local transcription
- Images use Stable Diffusion via diffusers

1cdfe825

Fix Jinja2 crash and tool call filtering · eb6b8d85

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Fix: Handle None content in messages to prevent Jinja2 'dict object has no attribute content' error
  - Added safety check in chat_completions function
  - Fixed _manual_format_messages to explicitly check for None
  - Fixed format_messages in VulkanBackend to ensure content is never None

- Fix: Always filter tool call format from output
  - Changed filter to run unconditionally (not just when tools are present)
  - Added extra regex patterns for JSON format tool calls like <tool>{...}</tool>

- Also fixed: Minor typos in comments (cket ->cket)

eb6b8d85

Fix tool parsing: deduplicate tool calls, strip raw format from streaming content · 886ea8f4

Stefy Lanza (nextime / spora ) authored Mar 08, 2026

- Add seen_signatures set to extract_tool_calls() to prevent duplicates
- Add strip_tool_calls_from_content() method to remove <tool>...</tool> tags
- Filter tool format from each chunk in real-time during streaming
- Simplify post-stream tool call handling since content is already cleaned
- Also handle non-streaming responses for tool call content cleanup

886ea8f4

Add CUDA build option for llama-cpp-python · 821e40dd
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

821e40dd
Create separate venv for each backend: venv_nvidia, venv_vulkan, venv_vulkan_nvidia · 58f4382d
Stefy Lanza (nextime / spora ) authored Mar 08, 2026

58f4382d