Commits · 392895dac672794e0200771cdbd96233dd9764eb · nexlab / coderai

19 Mar, 2026 36 commits

Fix: Add missing get_cached_model_path and get_model_cache_dir methods to MultiModelManager · 392895da

Your Name authored Mar 19, 2026

- Added proxy methods to MultiModelManager class for cache module functions
- These methods are called by images.py sd.cpp fallback path
- Fixes AttributeError: 'MultiModelManager' object has no attribute 'get_cached_model_path'

392895da

Fix: Improve HuggingFace model ID resolution for sd.cpp · ce75ec47

Your Name authored Mar 19, 2026

- Enhanced the HF model resolution logic in images.py sd.cpp fallback path
- Now checks for ANY cached file from the repo first (not just GGUF files)
- Falls back to checking for cached GGUF files specifically
- Last resort: downloads the first file in the repo as fallback
- Better error handling and logging throughout the resolution process
- This should resolve models that are already cached even if the exact GGUF filename isn't known

ce75ec47

Fix: Improve sd.cpp model loading fallback logic · 7bb4eec1

Your Name authored Mar 19, 2026

- Enhanced model resolution for sd.cpp fallback path
- Added multiple fallback strategies:
  1. Try HuggingFace GGUF resolution (existing)
  2. Fallback to direct file path check
  3. Fallback to cached model lookup
  4. Last resort: attempt download as URL
- Better error logging and handling
- Ensures model loading attempts all possible resolution paths before failing

7bb4eec1

Complete fix: Add ondemand mode model switching to audio and TTS endpoints · 63460a13

Your Name authored Mar 19, 2026

- Added model resolution and unload logic to /v1/audio/transcriptions
- Added model resolution and unload logic to /v1/audio/speech (TTS)
- Now ALL endpoints (text, image, audio, TTS) properly handle model switching
- In ondemand mode, ANY model type switch triggers unload first (e.g., text->audio, TTS->image, etc.)

63460a13

Fix: Proper model resolution for ondemand mode - unload when switching between ANY different models · a37085b4

Your Name authored Mar 19, 2026

- Added resolve_model_name() to MultiModelManager to properly resolve model aliases
- Added get_currently_loaded_model_name() to track what's actually in VRAM
- Updated /v1/chat/completions, /v1/completions, and /v1/images/generations
- Now correctly compares resolved canonical names before deciding to unload
- Handles all aliases (default, image, audio, tts) and custom aliases
- Works across ALL model types: text->text2, image->image2, text->image, etc.

a37085b4

Fix: Centralize model unloading - properly handle all model types in ondemand mode · 00775972

Your Name authored Mar 19, 2026

- Added unload_all_models() to MultiModelManager that handles ALL model types:
  ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects
- Text endpoints now properly unload image models before loading text models
- Image endpoints now properly unload text models before loading image models
- The rule: in ondemand mode, if the model in VRAM differs from the requested
  model (regardless of type), fully unload before loading the new one
- Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay

00775972

Fix: In ondemand mode, fully unload current model before loading new one · 7d838962

Your Name authored Mar 19, 2026

- In ondemand mode (no --load-all or --loadswap specified), when a new model
  is requested, the current model in VRAM is now fully unloaded before loading
  the new one. This ensures clean model switching.
- Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints
- Added same logic to image generation endpoints (diffusers and sd.cpp paths)
- Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()

7d838962

Fix black image: use --image-precision from CLI args instead of hardcoded float16 · 9b3126d7

Your Name authored Mar 19, 2026

Root cause: The refactored code was hardcoding torch.float16 for CUDA,
ignoring the --image-precision bf16 CLI argument. The Z-Image-Turbo model
requires bfloat16 precision - using float16 causes NaN values in the
image processor, resulting in all-black images.

Also restored the original model loading logic with:
- GGUF model detection (skip diffusers for GGUF)
- OOM retry with progressive memory optimization
- use_safetensors=True
- Sequential CPU offload support

9b3126d7

Fix black image issue: restore original default size (1024x1024) and NaN handling · 553cdf07

Your Name authored Mar 19, 2026

- Changed default image size from 512x512 back to 1024x1024 to match original coderai
- Changed NaN handling from 0.5 to 0.0 to match original coderai

553cdf07

Fix global_args not being passed to images module · a5a52504

Your Name authored Mar 19, 2026

- Added set_global_args call for images module in main.py
- Each API module has its own global_args, so it needs to be set separately
- Added debug logging to trace global_args in images.py

a5a52504

Fix image generation URL port and 404 issues · a28c6863

Your Name authored Mar 19, 2026

- Fixed file path not being set in app.py for /v1/files endpoint
- Fixed Host header parsing to correctly extract hostname without port
- Added debug logging to trace URL construction and file serving

a28c6863

Fix: Set file path for images module in main.py · ccc0f6ac
Your Name authored Mar 19, 2026

ccc0f6ac
Add debug output for image generation to diagnose black images · 5bddb025
Your Name authored Mar 19, 2026

5bddb025
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) - was... · b7fbde39
Your Name authored Mar 19, 2026
```
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) - was using it in original code
```
b7fbde39
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) · 671388fd
Your Name authored Mar 19, 2026

671388fd
Fix: properly unload text models before loading image model · afb2eead
Your Name authored Mar 19, 2026

afb2eead
Improve error handling for incomplete diffusers models · 6eb3b9b8
Your Name authored Mar 19, 2026

6eb3b9b8
Enhance debug output to show --ggg and --tools-closer-prompt flags · 17205499
Your Name authored Mar 19, 2026

17205499
Fix order - import from text.py before calling set_global_args · 28b8fa4d
Your Name authored Mar 19, 2026

28b8fa4d
Fix global_args not being set - import set_global_args from text.py · 5b5776d2
Your Name authored Mar 19, 2026

5b5776d2
Add missing format_tools_for_prompt and cleanup_control_tokens imports · b99ef5db
Your Name authored Mar 19, 2026

b99ef5db
Fix missing global_system_prompt getter in text.py · fee1c415
Your Name authored Mar 19, 2026

fee1c415

Default to preload model at startup · b1b0818f

Your Name authored Mar 19, 2026

- Default load mode is now 'loadall' (preload) instead of 'ondemand'
- Only use ondemand when --nopreload is explicitly specified
- Model will now be loaded at startup by default

b1b0818f

Fix on-demand model loading - load model when requested · b4fdeea6

Your Name authored Mar 19, 2026

- get_model_for_request now triggers model loading if not already loaded
- Added _load_default_model() method to load default model on demand
- Added _load_model_by_name() method to load any model on demand
- Fixes 503 'Model not loaded' error when requesting 'default' model

b4fdeea6

Enhance debug logging to show full request · 5e881874

Your Name authored Mar 19, 2026

- Show full request body without truncation
- Include HTTP method, URL, and headers
- Pretty-print JSON bodies

5e881874

Fix debug logging - use state module for global_debug · fcaa9452

Your Name authored Mar 19, 2026

- text.py had local global_debug variable that shadowed the state module
- Changed text.py to import get_global_debug from state module
- Changed set_global_debug() in text.py to call state module's function
- Changed all 'if global_debug:' to 'if get_global_debug():' in text.py
- log.py was already using get_global_debug() correctly

fcaa9452

Fix debug logging - import from state instead of app · ebd925a8
Your Name authored Mar 19, 2026

ebd925a8
Add text router to app.py · 2e63119f
Your Name authored Mar 19, 2026

2e63119f
Fix: set_default_model instead of set_model · 3a147045
Your Name authored Mar 19, 2026

3a147045
Add whisper-server initialization to main.py · 71b9572c
Your Name authored Mar 19, 2026

71b9572c
Fix: audio_models[0] instead of audio_model · c6e0dc7e
Your Name authored Mar 19, 2026

c6e0dc7e
Fix: Use singleton model_manager and multi_model_manager · 9016336b
Your Name authored Mar 19, 2026

9016336b
Fix import: set_grammar_guided_gen from codai.api.state · afc975ba
Your Name authored Mar 19, 2026

afc975ba
Fix missing ToolCallParser import in codai/api/text.py · 033312e7
Your Name authored Mar 19, 2026

033312e7

Fix circular import between codai.api.app and codai.api.images · 2b184d33

Your Name authored Mar 19, 2026

- Create codai/api/state.py for shared global state functions
- images.py now imports get_load_mode from state instead of app
- app.py re-exports functions from state for backward compatibility

2b184d33

Refactor: Move all API endpoints to codai.api module and extract CLI to codai.cli · 405be1cb

Your Name authored Mar 19, 2026

- Move parse_args to codai.cli
- Move main() to codai.main
- Simplify coderai to be a thin wrapper importing from codai package
- Create codai.api module with organized endpoints:
  - codai/api/app.py: FastAPI app, /v1/models, /v1/files, get_load_mode
  - codai/api/text.py: /v1/chat/completions, legacy /v1/completions
  - codai/api/images.py: /v1/images/generations
  - codai/api/transcriptions.py: /v1/audio/transcriptions
  - codai/api/tts.py: /v1/audio/speech
- coderai is now backward compatible entry point only

405be1cb

18 Mar, 2026 4 commits

Fix: Handle both dict and Pydantic Tool models in templates.py · 1c79d9ab

Your Name authored Mar 18, 2026

- Fixed AttributeError where Tool.get() was called on Pydantic model
- Added isinstance() checks to handle both dict and Pydantic Tool formats
- This fixes the error when using --force-reasoning with tools

1c79d9ab

Fix: Update VulkanBackend method signatures to match base class · 673ac596

Your Name authored Mar 18, 2026

- Added repeat_penalty, presence_penalty, frequency_penalty params to generate() and generate_stream()
- Changed from **kwargs to explicit parameters to match base class abstract methods

This fixes the TypeError when calling VulkanBackend.generate_stream() with extra params.

673ac596

Fix: Add repeat_penalty, presence_penalty, frequency_penalty params to NvidiaBackend · 4249e178

Your Name authored Mar 18, 2026

- Added missing parameters to generate() and generate_stream() methods
- Updated _generate_normal() and _generate_stream_normal() to use these params
- Also updated base.py abstract method signatures to match

This fixes the TypeError when using repeat_penalty with NVIDIA backend.

4249e178

Fix: Expand tool call repair for all hallucinated formats from debug.log · 441ea0fb

Your Name authored Mar 18, 2026

New patterns added to repair_broken_tool_calls():

1. Pattern 4: <tool><function>NAME</function><parameters>XML</parameters></tool>
   - Converts XML parameters to JSON format
   - Fills missing required params (e.g., path for list_files)

2. Pattern 0a: <tool><NAME><params></NAME></tool> (with closing tool name tag)
   - Handles format with closing tag for tool name

3. Expanded guard to detect known tool names used as wrapper tags
   - Now detects <fetch_instructions>, <list_files>, etc.

4. Fixed closure bug in Pattern -2 (wrong wrapper tags)
   - Used default argument to capture loop variable correctly

5. Post-processing: Fill missing required parameters
   - list_files gets path='.' if missing
   - search_files gets path='.' if missing

All 6 test cases pass:
- <tool><function>list_files</function><parameters>...</parameters></tool> -> OK
- <fetch_instructions><task>read_file</task>...</fetch_instructions> -> OK
- <tool_call><list_files></list_files></tool_call> -> OK
- <tool><list_files><path>.</path></list_files></tool> -> OK
- Valid JSON passthrough -> OK
- Missing required params auto-filled -> OK

441ea0fb