Commits · 1ca724e8674480e5b610606998f16f5e30c6a45f · nexlab / coderai

09 Mar, 2026 24 commits

Fix more model_name vs model_names bugs · 1ca724e8

Your Name authored Mar 09, 2026

Fix remaining occurrences of model_name (singular) being used instead
of model_names (list) in main function.

1ca724e8

Fix UnboundLocalError: model_name vs model_names · e0530b65

Your Name authored Mar 09, 2026

Fixed bug where model_name (singular) was used instead of model_names (list)
in several places, causing UnboundLocalError when running without --model.

e0530b65

Add --alias support for model names · a3c476ec

Your Name authored Mar 09, 2026

- CLI (coder): Add --alias argument to create model aliases
- Config: Add model_aliases dict and resolve_model() method
- Server (coderai): Add server-side alias support with --model-alias
- Aliases are resolved in both client and server when making API calls
- Aliases appear in /v1/models endpoint
- Aliases are persisted in config file

a3c476ec

Implement queue notification system for streaming responses · aafd41eb

Your Name authored Mar 09, 2026

- Add QueueManager class to track waiting requests
- Send 'waiting for model...' frames with time counter at regular intervals
- Send 'Model starting' frame when model begins processing
- Add x_queue_info field to streaming response frames for queue status
- Track queue position and wait time for each client

aafd41eb

Implement multiple audio/image model support with aliases · 65caf41f

Your Name authored Mar 09, 2026

- Add support for multiple --audio-model arguments (action='append')
- Add support for multiple --image-model arguments (action='append')
- Add 'audio' alias pointing to first audio model
- Add 'vision'/'image' aliases pointing to first image model
- Update MultiModelManager to store audio_models and image_models as lists
- Add audio_model and image_model properties for accessing first model
- Update get_model_for_request to handle aliases
- Update list_models to show all models and aliases
- Fix remaining references in main function to use list-based variables

65caf41f

Fix: Always use configured audio model regardless of request model parameter · c2bd5ffa
Your Name authored Mar 09, 2026

c2bd5ffa
Fix: Download model if not cached when using whisper.cpp CLI · 1cb7f4b3
Your Name authored Mar 09, 2026

1cb7f4b3

Fix: Use correct whisper.cpp CLI arguments · 4af2538e

Your Name authored Mar 09, 2026

- Changed --model to -m
- Changed --output to -otxt (output as text)
- Changed --device to -dev
- Changed --file to -f for input audio

4af2538e

Debug: Show whisper.cpp CLI command in debug mode · d343d706
Your Name authored Mar 09, 2026

d343d706

Fix: Make args accessible in FastAPI transcription endpoint · a388b95e

Your Name authored Mar 09, 2026

The args variable was not accessible in the create_transcription function,
causing a NameError when using --whisper-cpp CLI option. This fix adds
global_args to store the parsed arguments for access in endpoint functions.

a388b95e

Add --whisper-cpp option to use whisper.cpp CLI directly · 4eaa850f
Your Name authored Mar 09, 2026

4eaa850f
Add debug output for whispercpp import errors · 4c24c7b9
Your Name authored Mar 09, 2026

4c24c7b9
Fix UnboundLocalError for model_path in startup code · 966fad45
Your Name authored Mar 09, 2026

966fad45

Add Whisper GPU support via Vulkan backend · 803f2bb8

Your Name authored Mar 09, 2026

- Modified build.sh to build whispercpp with Vulkan support
- Added --audio-vulkan-device argument to specify GPU device for Whisper
- Added Vulkan detection and logging for Whisper transcription
- Set GGML_VULKAN_DEVICE environment variable for GPU selection

803f2bb8

Force CPU mode for faster-whisper (CUDA not compatible with Vulkan) · d23c2148
Your Name authored Mar 09, 2026

d23c2148
Add warning when faster-whisper runs on CPU (no CUDA) · 1dafc558
Your Name authored Mar 09, 2026

1dafc558

Fix: Skip faster-whisper for GGUF files · c8f70fe4

Your Name authored Mar 09, 2026

faster-whisper doesn't support GGUF format (it's llama.cpp format).
Now detects GGUF files by extension and goes directly to whispercpp.

c8f70fe4

Fix: Fall back to whispercpp when faster-whisper fails to load · 11a0fd46

Your Name authored Mar 09, 2026

- Add faster_whisper_failed flag to properly track failures
- When faster-whisper throws non-ImportError (e.g., GGUF not supported),
  now falls back to whispercpp instead of failing
- Applies to both pre-loading and transcription endpoint

11a0fd46

Fix error handling for audio transcription when libraries unavailable · fee8a9dd

Your Name authored Mar 09, 2026

- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors
- Improve error messages to mention both options:
  - Install PyTorch + faster-whisper
  - Use built-in whispercpp model (tiny/base/small/medium/large)
- Fix critical bug: now raises HTTPException instead of returning None

fee8a9dd

Fix pre-loading to recognize built-in whispercpp model names · 2186b190

Your Name authored Mar 09, 2026

- Recognize built-in model names: tiny, base, small, medium, large-v1, large
- Allow pre-loading these models directly without file path

2186b190

Improve whispercpp error handling for HuggingFace GGUF files · f5142c1b

Your Name authored Mar 09, 2026

- Add better error detection for 'not a valid preconverted model' errors
- Provide clear guidance to users about whispercpp limitations
- Suggest installing faster-whisper with PyTorch or using built-in model names
- Update both transcription endpoint and pre-loading code

f5142c1b

Add whispercpp support for audio transcription without PyTorch · 44941ac6

Your Name authored Mar 09, 2026

- Update transcription endpoint to try faster-whisper first, then whispercpp
- Update pre-loading code to support both backends
- Add whispercpp to all requirements files (vulkan, nvidia, default)
- Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)

44941ac6

Add faster-whisper to requirements for audio transcription · 6ef7a2dd
Your Name authored Mar 09, 2026

6ef7a2dd
Add test files to .gitignore · 606747de
Your Name authored Mar 09, 2026

606747de

08 Mar, 2026 16 commits
- Suppress unraisable LlamaModel.__del__ errors using sys.unraisablehook · f28c6185
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  f28c6185
- Use bare except to suppress llama.cpp __del__ errors · 6bd4dc91
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  6bd4dc91
- Suppress llama.cpp __del__ errors during pre-load · f9739fe3
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  f9739fe3
- Remove traceback print for optional audio pre-load · ba8e4792
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  ba8e4792
- Add clearer message when audio model loads on-demand · e554baef
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  e554baef
- Try faster-whisper first for audio pre-load, fall back to GGUF · bae50d66
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  bae50d66
- Use download_model helper for audio pre-load with progress · 4f6d64d4
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  4f6d64d4
- Add download_model helper with progress: size, total, speed · b622fe9e
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  b622fe9e
- Add better error handling for GGUF audio model loading · 23fe4347
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  23fe4347
- Add GGUF audio model support with llama.cpp (Vulkan) · 3daca858
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
When audio model is in GGUF format, use llama.cpp instead of faster-whisper
for pre-loading. This allows using Vulkan backend for audio transcription.
```
  3daca858
- Auto-pre-load single model when only one model type is configured · 833a4ff3
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
When only one model type is specified (e.g., only --audio-model with no
--model), automatically pre-load it even in on-demand mode. This ensures
the model is downloaded and ready for use.
```
  833a4ff3
- Add model pre-loading support (--loadall, --loadswap) and fix duplicate code bug · 6310e8b1
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
- Add --loadall flag to pre-load all models at startup
- Add --loadswap flag to keep models in RAM, swap active to VRAM
- Fix bug where load_mode was used before being defined in audio model section
- Remove duplicate load_mode determination code
- Improve error message for no main model specified to include TTS
```
  6310e8b1
- Add audio model pre-loading at startup when --loadall is used · 7651468e
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  7651468e
- Add TTS support with kokoro-python and model caching improvements · ebd4acbb
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
- Add --tts-model option for Kokoro TTS models
- Add /v1/audio/speech endpoint (OpenAI-compatible)
- Add model caching to prevent redundant downloads
- Replace MD5 with SHA-256 for cache keys
- Move hashlib and pathlib imports to module level
```
  ebd4acbb
- Make --model optional when --audio-model or --image-model are specified · 10dc9f5c
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
- --model is now optional if using audio or image models only
- Shows helpful error message with examples if no model specified
- Prints available models at startup
```
  10dc9f5c
- Support full URLs for model paths · 3ae1869a
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
```
- Accept full HTTPS URLs for --model (Vulkan/GGUF models)
- Accept full HTTPS URLs for --audio-model (faster-whisper models)
- Downloads file to temp directory before loading
- Shows download progress percentage
```
  3ae1869a