- 19 Mar, 2026 39 commits
-
-
Your Name authored
- Updated get_all_cache_dirs() to properly find HuggingFace hub directory - Now checks for ~/.cache/huggingface/hub/ instead of just ~/.cache/huggingface/ - This fixes --list-cached-models not showing HuggingFace cached models
-
Your Name authored
- Removed the GGUF-only restriction on sd.cpp fallback - Some HF models may be GGUF even without 'gguf' in the name - Let sd.cpp attempt loading and fail gracefully if incompatible - This allows sd.cpp to work as a proper fallback for any model type
-
Your Name authored
- Added check to only attempt sd.cpp fallback for GGUF models - Tongyi-MAI/Z-Image-Turbo is a diffusers model, not GGUF, so sd.cpp should be skipped - sd.cpp only supports GGUF models, diffusers models use the diffusers pipeline - This prevents unnecessary sd.cpp resolution attempts for incompatible model types
-
Your Name authored
- Added proxy methods to MultiModelManager class for cache module functions - These methods are called by images.py sd.cpp fallback path - Fixes AttributeError: 'MultiModelManager' object has no attribute 'get_cached_model_path'
-
Your Name authored
- Enhanced the HF model resolution logic in images.py sd.cpp fallback path - Now checks for ANY cached file from the repo first (not just GGUF files) - Falls back to checking for cached GGUF files specifically - Last resort: downloads the first file in the repo as fallback - Better error handling and logging throughout the resolution process - This should resolve models that are already cached even if the exact GGUF filename isn't known
-
Your Name authored
- Enhanced model resolution for sd.cpp fallback path - Added multiple fallback strategies: 1. Try HuggingFace GGUF resolution (existing) 2. Fallback to direct file path check 3. Fallback to cached model lookup 4. Last resort: attempt download as URL - Better error logging and handling - Ensures model loading attempts all possible resolution paths before failing
-
Your Name authored
- Added model resolution and unload logic to /v1/audio/transcriptions - Added model resolution and unload logic to /v1/audio/speech (TTS) - Now ALL endpoints (text, image, audio, TTS) properly handle model switching - In ondemand mode, ANY model type switch triggers unload first (e.g., text->audio, TTS->image, etc.)
-
Your Name authored
- Added resolve_model_name() to MultiModelManager to properly resolve model aliases - Added get_currently_loaded_model_name() to track what's actually in VRAM - Updated /v1/chat/completions, /v1/completions, and /v1/images/generations - Now correctly compares resolved canonical names before deciding to unload - Handles all aliases (default, image, audio, tts) and custom aliases - Works across ALL model types: text->text2, image->image2, text->image, etc.
-
Your Name authored
- Added unload_all_models() to MultiModelManager that handles ALL model types: ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects - Text endpoints now properly unload image models before loading text models - Image endpoints now properly unload text models before loading image models - The rule: in ondemand mode, if the model in VRAM differs from the requested model (regardless of type), fully unload before loading the new one - Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay
-
Your Name authored
- In ondemand mode (no --load-all or --loadswap specified), when a new model is requested, the current model in VRAM is now fully unloaded before loading the new one. This ensures clean model switching. - Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints - Added same logic to image generation endpoints (diffusers and sd.cpp paths) - Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()
-
Your Name authored
Root cause: The refactored code was hardcoding torch.float16 for CUDA, ignoring the --image-precision bf16 CLI argument. The Z-Image-Turbo model requires bfloat16 precision - using float16 causes NaN values in the image processor, resulting in all-black images. Also restored the original model loading logic with: - GGUF model detection (skip diffusers for GGUF) - OOM retry with progressive memory optimization - use_safetensors=True - Sequential CPU offload support
-
Your Name authored
- Changed default image size from 512x512 back to 1024x1024 to match original coderai - Changed NaN handling from 0.5 to 0.0 to match original coderai
-
Your Name authored
- Added set_global_args call for images module in main.py - Each API module has its own global_args, so it needs to be set separately - Added debug logging to trace global_args in images.py
-
Your Name authored
- Fixed file path not being set in app.py for /v1/files endpoint - Fixed Host header parsing to correctly extract hostname without port - Added debug logging to trace URL construction and file serving
-
Your Name authored
-
Your Name authored
-
Your Name authored
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) - was using it in original code
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Default load mode is now 'loadall' (preload) instead of 'ondemand' - Only use ondemand when --nopreload is explicitly specified - Model will now be loaded at startup by default
-
Your Name authored
- get_model_for_request now triggers model loading if not already loaded - Added _load_default_model() method to load default model on demand - Added _load_model_by_name() method to load any model on demand - Fixes 503 'Model not loaded' error when requesting 'default' model
-
Your Name authored
- Show full request body without truncation - Include HTTP method, URL, and headers - Pretty-print JSON bodies
-
Your Name authored
- text.py had local global_debug variable that shadowed the state module - Changed text.py to import get_global_debug from state module - Changed set_global_debug() in text.py to call state module's function - Changed all 'if global_debug:' to 'if get_global_debug():' in text.py - log.py was already using get_global_debug() correctly
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Create codai/api/state.py for shared global state functions - images.py now imports get_load_mode from state instead of app - app.py re-exports functions from state for backward compatibility
-
Your Name authored
- Move parse_args to codai.cli - Move main() to codai.main - Simplify coderai to be a thin wrapper importing from codai package - Create codai.api module with organized endpoints: - codai/api/app.py: FastAPI app, /v1/models, /v1/files, get_load_mode - codai/api/text.py: /v1/chat/completions, legacy /v1/completions - codai/api/images.py: /v1/images/generations - codai/api/transcriptions.py: /v1/audio/transcriptions - codai/api/tts.py: /v1/audio/speech - coderai is now backward compatible entry point only
-
- 18 Mar, 2026 1 commit
-
-
Your Name authored
- Fixed AttributeError where Tool.get() was called on Pydantic model - Added isinstance() checks to handle both dict and Pydantic Tool formats - This fixes the error when using --force-reasoning with tools
-