Commits · 0609c4cfae7a967be275a98099ce1e39557fd76d · nexlab / coderai

10 Mar, 2026 10 commits

Fix: Remove redundant import os statements causing UnboundLocalError · 0609c4cf

Your Name authored Mar 10, 2026

- Removed redundant 'import os' statements inside functions (lines 4522, 4926, 5005)
- Added back missing 'from llama_cpp import Llama' that was accidentally deleted
- Global 'import os' at line 12 is now the only one

This fixes the UnboundLocalError when running --list-cached-models or other CLI options.

0609c4cf

Add cache management CLI options · 496c4e53

Your Name authored Mar 10, 2026

- --list-cached-models: List all cached models with sizes
- --remove-all-models: Remove all cached models
- --remove-model <modelid>: Remove specific model by name/hash (partial match)

496c4e53

Add GGUF magic bytes validation · 015c6908

Your Name authored Mar 10, 2026

- Check if downloaded file is valid GGUF (magic bytes = 'GGUF')
- If not valid, show clear error that URL is wrong (returns HTML instead)
- Explain that URL must be direct download link ending in .gguf

015c6908

Add file verification and magic bytes check for GGUF models · 611bfd8f
Your Name authored Mar 10, 2026

611bfd8f

Add verbose error handling for GGUF image model loading · 141329bc

Your Name authored Mar 10, 2026

- Enable verbose=True in llama.cpp to see actual error
- Print GGUF model file size for debugging
- Add try/except with traceback to see detailed errors

141329bc

Improve GGUF image model loading - better URL handling · 9af89755

Your Name authored Mar 10, 2026

- Check if model is URL before any processing
- Use original model name with query params for URL download
- Strip query params only for HuggingFace repo ID parsing
- Added more debug output to trace issues

9af89755

Fix GGUF image model loading - strip query parameters · 6bc9af36

Your Name authored Mar 10, 2026

- Strip query parameters from model name before processing
- Handle URLs with ?download=true or other query params

6bc9af36

Add GGUF image model support in --loadall mode · e848dd47

Your Name authored Mar 10, 2026

- Detect if image model is GGUF (ends with .gguf or contains 'gguf')
- If GGUF, load using llama.cpp (same as text Vulkan models)
- If diffusers model, load using Stable Diffusion pipeline
- Fixed both locations where image model preloading happens
- Now supports both GGUF and diffusers image generation models

e848dd47

Fix image model preloading with --loadall flag · 2308d5b0

Your Name authored Mar 10, 2026

- Fixed bug where image model wasn't actually being loaded when --loadall was specified
- The code only printed messages but never loaded the diffusers pipeline
- Now actually loads the Stable Diffusion pipeline using diffusers library
- Tries StableDiffusionXLPipeline first, falls back to generic DiffusionPipeline
- Moves to GPU if CUDA available, enables attention slicing for memory efficiency
- Also fixes second location where image model is the only configured model

- Debug command line output was already implemented

2308d5b0

Fix --loadall model preloading and --debug command line output · 9193536a

Your Name authored Mar 10, 2026

- Fixed undefined variable bug where model_name wasn't defined in scope
- Fixed duplicate model loading when using --loadall/--loadswap with multiple models
- First model is now only loaded once (skipped in loop if already loaded)
- Loadall mode now properly preloads all models in VRAM respecting offload strategy
- Loadswap mode properly loads additional models to RAM
- Ondemand mode doesn't reload first model

Feature 1: --debug now shows full command line as first output
Feature 2: --loadall with multiple models now preloads all in VRAM

9193536a

09 Mar, 2026 30 commits

Add --convert flag to whisper-server command for audio format conversion · b51d08b1
Your Name authored Mar 09, 2026

b51d08b1
Add debug logging for whisper-server transcription requests · 3ffd8f3e
Your Name authored Mar 09, 2026

3ffd8f3e

Fix whisper-server: check for whisper-server FIRST in transcription endpoint · 37df61f9