Commits · e848dd478814cd79916e43b367057331c0eceeec · nexlab / coderai

10 Mar, 2026 3 commits

Add GGUF image model support in --loadall mode · e848dd47

Your Name authored Mar 10, 2026

- Detect if image model is GGUF (ends with .gguf or contains 'gguf')
- If GGUF, load using llama.cpp (same as text Vulkan models)
- If diffusers model, load using Stable Diffusion pipeline
- Fixed both locations where image model preloading happens
- Now supports both GGUF and diffusers image generation models

e848dd47

Fix image model preloading with --loadall flag · 2308d5b0

Your Name authored Mar 10, 2026

- Fixed bug where image model wasn't actually being loaded when --loadall was specified
- The code only printed messages but never loaded the diffusers pipeline
- Now actually loads the Stable Diffusion pipeline using diffusers library
- Tries StableDiffusionXLPipeline first, falls back to generic DiffusionPipeline
- Moves to GPU if CUDA available, enables attention slicing for memory efficiency
- Also fixes second location where image model is the only configured model

- Debug command line output was already implemented

2308d5b0

Fix --loadall model preloading and --debug command line output · 9193536a

Your Name authored Mar 10, 2026

- Fixed undefined variable bug where model_name wasn't defined in scope
- Fixed duplicate model loading when using --loadall/--loadswap with multiple models
- First model is now only loaded once (skipped in loop if already loaded)
- Loadall mode now properly preloads all models in VRAM respecting offload strategy
- Loadswap mode properly loads additional models to RAM
- Ondemand mode doesn't reload first model

Feature 1: --debug now shows full command line as first output
Feature 2: --loadall with multiple models now preloads all in VRAM

9193536a

09 Mar, 2026 30 commits

Add --convert flag to whisper-server command for audio format conversion · b51d08b1
Your Name authored Mar 09, 2026

b51d08b1
Add debug logging for whisper-server transcription requests · 3ffd8f3e
Your Name authored Mar 09, 2026

3ffd8f3e

Fix whisper-server: check for whisper-server FIRST in transcription endpoint · 37df61f9

Your Name authored Mar 09, 2026

- Move whisper-server check before audio_model check
- Now whisper-server will be used if available, regardless of audio_model setting
- Also update multi_model_manager.audio_models with cached path

37df61f9

Fix whisper-server: use cached model path for audio_models · bbf12dd4

Your Name authored Mar 09, 2026

- WhisperServerManager.start() now returns actual model path (useful for URL -> cached path)
- Update audio_models[0] with cached path after downloading
- Store actual_model_path in current_model instead of original URL

bbf12dd4

Fix whisper-server: use different port, check availability, handle URL models · 0e67a9a2

Your Name authored Mar 09, 2026

- Changed default port from 8081 to 8744 (less common)
- Check if port is available before using, auto-find available port if needed
- Download URL models before starting whisper-server (use model cache)

0e67a9a2

Add --whisper-server support for audio transcription · 005dfd46

Your Name authored Mar 09, 2026

- Add WhisperServerManager class to manage whisper-server subprocess
- Add --whisper-server argument to specify whisper-server binary path
- Add --whisper-server-port argument for port configuration (default 8081)
- Modify audio transcription endpoint to proxy to whisper-server
- Add cleanup on shutdown to stop whisper-server
- Model can stay loaded in VRAM as long as the server runs

005dfd46

Fix more model_name vs model_names bugs · 1ca724e8

Your Name authored Mar 09, 2026

Fix remaining occurrences of model_name (singular) being used instead
of model_names (list) in main function.

1ca724e8

Fix UnboundLocalError: model_name vs model_names · e0530b65

Your Name authored Mar 09, 2026

Fixed bug where model_name (singular) was used instead of model_names (list)
in several places, causing UnboundLocalError when running without --model.

e0530b65

Add --alias support for model names · a3c476ec

Your Name authored Mar 09, 2026

- CLI (coder): Add --alias argument to create model aliases
- Config: Add model_aliases dict and resolve_model() method
- Server (coderai): Add server-side alias support with --model-alias
- Aliases are resolved in both client and server when making API calls
- Aliases appear in /v1/models endpoint
- Aliases are persisted in config file

a3c476ec

Implement queue notification system for streaming responses · aafd41eb

Your Name authored Mar 09, 2026

- Add QueueManager class to track waiting requests
- Send 'waiting for model...' frames with time counter at regular intervals
- Send 'Model starting' frame when model begins processing
- Add x_queue_info field to streaming response frames for queue status
- Track queue position and wait time for each client

aafd41eb

Implement multiple audio/image model support with aliases · 65caf41f

Your Name authored Mar 09, 2026

- Add support for multiple --audio-model arguments (action='append')
- Add support for multiple --image-model arguments (action='append')
- Add 'audio' alias pointing to first audio model
- Add 'vision'/'image' aliases pointing to first image model
- Update MultiModelManager to store audio_models and image_models as lists
- Add audio_model and image_model properties for accessing first model
- Update get_model_for_request to handle aliases
- Update list_models to show all models and aliases
- Fix remaining references in main function to use list-based variables

65caf41f

Fix: Always use configured audio model regardless of request model parameter · c2bd5ffa
Your Name authored Mar 09, 2026

c2bd5ffa
Fix: Download model if not cached when using whisper.cpp CLI · 1cb7f4b3
Your Name authored Mar 09, 2026

1cb7f4b3

Fix: Use correct whisper.cpp CLI arguments · 4af2538e

Your Name authored Mar 09, 2026

- Changed --model to -m
- Changed --output to -otxt (output as text)
- Changed --device to -dev
- Changed --file to -f for input audio

4af2538e

Debug: Show whisper.cpp CLI command in debug mode · d343d706
Your Name authored Mar 09, 2026

d343d706

Fix: Make args accessible in FastAPI transcription endpoint · a388b95e

Your Name authored Mar 09, 2026

The args variable was not accessible in the create_transcription function,
causing a NameError when using --whisper-cpp CLI option. This fix adds
global_args to store the parsed arguments for access in endpoint functions.

a388b95e

Add --whisper-cpp option to use whisper.cpp CLI directly · 4eaa850f
Your Name authored Mar 09, 2026

4eaa850f
Add debug output for whispercpp import errors · 4c24c7b9
Your Name authored Mar 09, 2026

4c24c7b9
Fix UnboundLocalError for model_path in startup code · 966fad45
Your Name authored Mar 09, 2026

966fad45

Add Whisper GPU support via Vulkan backend · 803f2bb8

Your Name authored Mar 09, 2026

- Modified build.sh to build whispercpp with Vulkan support
- Added --audio-vulkan-device argument to specify GPU device for Whisper
- Added Vulkan detection and logging for Whisper transcription
- Set GGML_VULKAN_DEVICE environment variable for GPU selection

803f2bb8

Force CPU mode for faster-whisper (CUDA not compatible with Vulkan) · d23c2148
Your Name authored Mar 09, 2026

d23c2148
Add warning when faster-whisper runs on CPU (no CUDA) · 1dafc558
Your Name authored Mar 09, 2026

1dafc558

Fix: Skip faster-whisper for GGUF files · c8f70fe4

Your Name authored Mar 09, 2026

faster-whisper doesn't support GGUF format (it's llama.cpp format).
Now detects GGUF files by extension and goes directly to whispercpp.

c8f70fe4

Fix: Fall back to whispercpp when faster-whisper fails to load · 11a0fd46

Your Name authored Mar 09, 2026

- Add faster_whisper_failed flag to properly track failures
- When faster-whisper throws non-ImportError (e.g., GGUF not supported),
  now falls back to whispercpp instead of failing
- Applies to both pre-loading and transcription endpoint

11a0fd46

Fix error handling for audio transcription when libraries unavailable · fee8a9dd

Your Name authored Mar 09, 2026

- Add specific detection for 'invalid ELF' / 'Mach-O' architecture mismatch errors
- Improve error messages to mention both options:
  - Install PyTorch + faster-whisper
  - Use built-in whispercpp model (tiny/base/small/medium/large)
- Fix critical bug: now raises HTTPException instead of returning None

fee8a9dd

Fix pre-loading to recognize built-in whispercpp model names · 2186b190

Your Name authored Mar 09, 2026

- Recognize built-in model names: tiny, base, small, medium, large-v1, large
- Allow pre-loading these models directly without file path

2186b190

Improve whispercpp error handling for HuggingFace GGUF files · f5142c1b

Your Name authored Mar 09, 2026

- Add better error detection for 'not a valid preconverted model' errors
- Provide clear guidance to users about whispercpp limitations
- Suggest installing faster-whisper with PyTorch or using built-in model names
- Update both transcription endpoint and pre-loading code

f5142c1b

Add whispercpp support for audio transcription without PyTorch · 44941ac6

Your Name authored Mar 09, 2026

- Update transcription endpoint to try faster-whisper first, then whispercpp
- Update pre-loading code to support both backends
- Add whispercpp to all requirements files (vulkan, nvidia, default)
- Remove broken llama.cpp fallback (llama.cpp cannot transcribe Whisper)

44941ac6

Add faster-whisper to requirements for audio transcription · 6ef7a2dd
Your Name authored Mar 09, 2026

6ef7a2dd
Add test files to .gitignore · 606747de
Your Name authored Mar 09, 2026

606747de

08 Mar, 2026 7 commits
- Suppress unraisable LlamaModel.__del__ errors using sys.unraisablehook · f28c6185
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  f28c6185
- Use bare except to suppress llama.cpp __del__ errors · 6bd4dc91
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  6bd4dc91
- Suppress llama.cpp __del__ errors during pre-load · f9739fe3
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  f9739fe3
- Remove traceback print for optional audio pre-load · ba8e4792
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  ba8e4792
- Add clearer message when audio model loads on-demand · e554baef
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  e554baef
- Try faster-whisper first for audio pre-load, fall back to GGUF · bae50d66
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  bae50d66
- Use download_model helper for audio pre-load with progress · 4f6d64d4
  Stefy Lanza (nextime / spora ) authored Mar 08, 2026
  
  4f6d64d4