Commits · c08a5b4f27907e64f2a61d32270651a903af6263 · nexlab / coderai

19 Mar, 2026 40 commits

Implement proper loadswap/loadall/ondemand model management modes · c08a5b4f

Your Name authored Mar 19, 2026

- Default mode changed to ondemand (pre-load first model, unload/load on switch)
- loadswap: load first model in VRAM, others in CPU RAM, swap on switch
- loadall: try to load all models in VRAM, offload to CPU RAM if OOM
- --nopreload: skip pre-loading in any mode, load on first request
- request_model() now properly handles all three modes
- Added _move_model_to_cpu() and _move_model_to_vram() for loadswap
- Fixed NameError: model_manager reference in request_model() (was using global singleton instead of self)
- Updated CLI help text for --loadall, --loadswap, --nopreload

c08a5b4f

Centralize model resolution and VRAM management in MultiModelManager.request_model() · e004541a

Your Name authored Mar 19, 2026

- Added request_model() method to MultiModelManager that handles:
  1. Alias resolution (image, audio, tts, vision, default, custom aliases)
  2. VRAM management (unloading previous models in ondemand mode)
  3. Checking if model is already loaded

- Simplified codai/api/images.py:
  - Uses request_model() for model resolution and VRAM management
  - Extracted helper functions: _is_gguf_model(), _load_diffusers_pipeline(),
    _generate_with_diffusers(), _generate_with_sdcpp(), _load_sdcpp_model()
  - Removed duplicated sd.cpp generation code
  - Fixed semaphore scope (all generation now inside semaphore block)

- Simplified codai/api/tts.py:
  - Uses request_model() instead of duplicated VRAM management code
  - Removed duplicate get_cached_model_path() and get_model_cache_dir() wrappers

- Simplified codai/api/transcriptions.py:
  - Uses request_model() instead of duplicated VRAM management code

- Simplified codai/api/text.py:
  - Both /v1/chat/completions and /v1/completions use request_model()
  - Removed duplicated VRAM management blocks

e004541a

Fix missing os import and duplicate attributes in models/manager.py · a5b64c4c
Your Name authored Mar 19, 2026

a5b64c4c

Fix architecture: Proper separation of Model Manager and Cache responsibilities · 7788ce85

Your Name authored Mar 19, 2026

- **Model Manager**: Central coordinator for model lifecycle, alias resolution, loading/unloading
- **Cache Module**: Handles downloading, caching, and storage of models
- **API Modules**: Request models from Model Manager (not directly from cache)

Key changes:
- Removed resolve_and_load_model() from cache - moved logic to Model Manager
- Model Manager now downloads/caches models at startup when registered
- API modules use multi_model_manager.load_model() instead of cache functions
- Proper separation: Cache=storage, Manager=lifecycle coordination, APIs=requests

This fixes the incorrect direct API-to-cache coupling and establishes proper architectural boundaries.

7788ce85

Centralize model resolution logic in cache module · de4d544f

Your Name authored Mar 19, 2026

- Added resolve_and_load_model() function to codai.models.cache
- Simplified codai/api/images.py by removing 100+ lines of complex model resolution logic
- API modules now use single centralized function for all model loading
- Eliminates code duplication across API endpoints
- All model resolution logic now managed in one place

de4d544f

Fix image generation to properly handle diffusers vs GGUF models · c535ca5f

Your Name authored Mar 19, 2026

- Added check in sd.cpp fallback to skip HF model IDs that are likely diffusers models
- Prevents sd.cpp from trying to download non-GGUF files like .gitattributes for diffusers models
- Tongyi-MAI/Z-Image-Turbo and similar diffusers models now handled correctly by diffusers library
- GGUF models still work with sd.cpp as before

c535ca5f

Fix API modules to use centralized cache functions · 5e641ba2

Your Name authored Mar 19, 2026

- Updated codai/api/images.py to use cache module functions directly
- Updated codai/api/tts.py to use centralized load_model() function
- Removed proxy method calls that were causing AttributeError
- All model loading/downloading now goes through codai.models.cache

5e641ba2

Implement intelligent model loading for local files, URLs, and HF IDs · bff24350

Your Name authored Mar 19, 2026

- Updated load_model() to handle three input types:
  1. Local files: Use directly without caching
  2. URLs: Download to cache if not cached, then use
  3. HF model IDs: Download via HF API if not cached, then use
- Updated get_cached_model_path() to validate local files
- Enhanced module documentation to reflect new capabilities
- All model types (text, image, audio, etc.) can now use any input type

bff24350

Fix --remove-model to remove entire HF repository directories · 3e3067a9

Your Name authored Mar 19, 2026

- Updated remove_cached_model() to remove entire repo directories when matching by repo_id
- Previously only removed individual files, now removes complete repository cache
- Handles both files and directories in removal process
- More thorough cleanup of HuggingFace cached models

3e3067a9

Centralize all model loading/downloading logic in codai.models.cache · c93d4a6b

Your Name authored Mar 19, 2026

- Added unified load_model() function as main entry point for model loading
- Updated WhisperServerManager to use centralized load_model() instead of inline logic
- Removed proxy methods from MultiModelManager - use cache module directly
- All cache functions now work seamlessly with both GGUF and HF model caches
- Improved separation of concerns: cache module handles all caching/downloading

c93d4a6b

Unify cache functions to work with both GGUF and HuggingFace caches · 82735770

Your Name authored Mar 19, 2026

- Updated get_cached_model_path() to check both coderai and HF caches
- Updated download_model() to handle both URLs and HF model IDs automatically
- Made download_huggingface_model() consistent with unified API
- Updated module docstring to reflect unified cache functionality
- All cache functions now work seamlessly with both cache types

82735770

Fix --remove-model to work with HuggingFace repo IDs · 52eb402a

Your Name authored Mar 19, 2026

- Updated remove_cached_model() to search by repo_id for HuggingFace models
- Moved cache management options (--list-cached-models, --remove-model, --remove-all-models) to run before heavy imports
- Improved cache operations to use centralized functions in codai.models.cache module
- Fixed model removal to work with full repo IDs like 'TheBloke/Llama-2-7B-GGUF'

52eb402a

Refactor --list-cached-models to use centralized cache module function · e509279a

Your Name authored Mar 19, 2026

- Add list_cached_models_info() function to codai.models.cache module
- Move cache listing logic from main.py to the cache module
- Update main.py to use the centralized function early (before heavy imports)
- Improves code organization and avoids unnecessary imports for --list-cached-models

e509279a

Fix: Properly implement --list-cached-models with model-level information · 4d9f9886

Your Name authored Mar 19, 2026

- CoderAI cache: Shows individual GGUF files with sizes
- HuggingFace cache: Uses HF API (scan_cache_dir) to show model-level info, not individual files
- Shows model names, sizes, revision counts - not thousands of individual files
- Much more useful and readable output

4d9f9886

Fix: --list-cached-models now displays individual cached files · 07cf6c3f

Your Name authored Mar 19, 2026

- Added code to print individual cached model files with sizes
- Previously only showed cache directory headers and summary
- Now shows each file with format: [cache_name] filename (size MB)
- Matches the format used by --remove-model command

07cf6c3f

Fix: Correct HuggingFace cache directory detection · 73c81b2f

Your Name authored Mar 19, 2026

- Updated get_all_cache_dirs() to properly find HuggingFace hub directory
- Now checks for ~/.cache/huggingface/hub/ instead of just ~/.cache/huggingface/
- This fixes --list-cached-models not showing HuggingFace cached models

73c81b2f

Revert: Keep sd.cpp fallback available for all models when diffusers fails · 2cedd442

Your Name authored Mar 19, 2026

- Removed the GGUF-only restriction on sd.cpp fallback
- Some HF models may be GGUF even without 'gguf' in the name
- Let sd.cpp attempt loading and fail gracefully if incompatible
- This allows sd.cpp to work as a proper fallback for any model type

2cedd442

Fix: Skip sd.cpp fallback for non-GGUF models · f5b9d812

Your Name authored Mar 19, 2026

- Added check to only attempt sd.cpp fallback for GGUF models
- Tongyi-MAI/Z-Image-Turbo is a diffusers model, not GGUF, so sd.cpp should be skipped
- sd.cpp only supports GGUF models, diffusers models use the diffusers pipeline
- This prevents unnecessary sd.cpp resolution attempts for incompatible model types

f5b9d812

Fix: Add missing get_cached_model_path and get_model_cache_dir methods to MultiModelManager · 392895da

Your Name authored Mar 19, 2026

- Added proxy methods to MultiModelManager class for cache module functions
- These methods are called by images.py sd.cpp fallback path
- Fixes AttributeError: 'MultiModelManager' object has no attribute 'get_cached_model_path'

392895da

Fix: Improve HuggingFace model ID resolution for sd.cpp · ce75ec47

Your Name authored Mar 19, 2026

- Enhanced the HF model resolution logic in images.py sd.cpp fallback path
- Now checks for ANY cached file from the repo first (not just GGUF files)
- Falls back to checking for cached GGUF files specifically
- Last resort: downloads the first file in the repo as fallback
- Better error handling and logging throughout the resolution process
- This should resolve models that are already cached even if the exact GGUF filename isn't known

ce75ec47

Fix: Improve sd.cpp model loading fallback logic · 7bb4eec1

Your Name authored Mar 19, 2026

- Enhanced model resolution for sd.cpp fallback path
- Added multiple fallback strategies:
  1. Try HuggingFace GGUF resolution (existing)
  2. Fallback to direct file path check
  3. Fallback to cached model lookup
  4. Last resort: attempt download as URL
- Better error logging and handling
- Ensures model loading attempts all possible resolution paths before failing

7bb4eec1

Complete fix: Add ondemand mode model switching to audio and TTS endpoints · 63460a13

Your Name authored Mar 19, 2026

- Added model resolution and unload logic to /v1/audio/transcriptions
- Added model resolution and unload logic to /v1/audio/speech (TTS)
- Now ALL endpoints (text, image, audio, TTS) properly handle model switching
- In ondemand mode, ANY model type switch triggers unload first (e.g., text->audio, TTS->image, etc.)

63460a13

Fix: Proper model resolution for ondemand mode - unload when switching between ANY different models · a37085b4

Your Name authored Mar 19, 2026

- Added resolve_model_name() to MultiModelManager to properly resolve model aliases
- Added get_currently_loaded_model_name() to track what's actually in VRAM
- Updated /v1/chat/completions, /v1/completions, and /v1/images/generations
- Now correctly compares resolved canonical names before deciding to unload
- Handles all aliases (default, image, audio, tts) and custom aliases
- Works across ALL model types: text->text2, image->image2, text->image, etc.

a37085b4

Fix: Centralize model unloading - properly handle all model types in ondemand mode · 00775972

Your Name authored Mar 19, 2026

- Added unload_all_models() to MultiModelManager that handles ALL model types:
  ModelManager, diffusers pipelines, sd.cpp StableDiffusion, and any other objects
- Text endpoints now properly unload image models before loading text models
- Image endpoints now properly unload text models before loading image models
- The rule: in ondemand mode, if the model in VRAM differs from the requested
  model (regardless of type), fully unload before loading the new one
- Includes gc.collect(), torch.cuda.empty_cache(), and 1s settle delay

00775972

Fix: In ondemand mode, fully unload current model before loading new one · 7d838962

Your Name authored Mar 19, 2026

- In ondemand mode (no --load-all or --loadswap specified), when a new model
  is requested, the current model in VRAM is now fully unloaded before loading
  the new one. This ensures clean model switching.
- Added cleanup logic to both /v1/chat/completions and /v1/completions endpoints
- Added same logic to image generation endpoints (diffusers and sd.cpp paths)
- Cleanup includes: model cleanup, gc.collect(), torch.cuda.empty_cache()

7d838962

Fix black image: use --image-precision from CLI args instead of hardcoded float16 · 9b3126d7

Your Name authored Mar 19, 2026

Root cause: The refactored code was hardcoding torch.float16 for CUDA,
ignoring the --image-precision bf16 CLI argument. The Z-Image-Turbo model
requires bfloat16 precision - using float16 causes NaN values in the
image processor, resulting in all-black images.

Also restored the original model loading logic with:
- GGUF model detection (skip diffusers for GGUF)
- OOM retry with progressive memory optimization
- use_safetensors=True
- Sequential CPU offload support

9b3126d7

Fix black image issue: restore original default size (1024x1024) and NaN handling · 553cdf07

Your Name authored Mar 19, 2026

- Changed default image size from 512x512 back to 1024x1024 to match original coderai
- Changed NaN handling from 0.5 to 0.0 to match original coderai

553cdf07

Fix global_args not being passed to images module · a5a52504

Your Name authored Mar 19, 2026

- Added set_global_args call for images module in main.py
- Each API module has its own global_args, so it needs to be set separately
- Added debug logging to trace global_args in images.py

a5a52504

Fix image generation URL port and 404 issues · a28c6863

Your Name authored Mar 19, 2026

- Fixed file path not being set in app.py for /v1/files endpoint
- Fixed Host header parsing to correctly extract hostname without port
- Added debug logging to trace URL construction and file serving

a28c6863

Fix: Set file path for images module in main.py · ccc0f6ac
Your Name authored Mar 19, 2026

ccc0f6ac
Add debug output for image generation to diagnose black images · 5bddb025
Your Name authored Mar 19, 2026

5bddb025
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) - was... · b7fbde39
Your Name authored Mar 19, 2026
```
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) - was using it in original code
```
b7fbde39
Fix: Use DiffusionPipeline for custom model support (ZImagePipeline) · 671388fd
Your Name authored Mar 19, 2026

671388fd
Fix: properly unload text models before loading image model · afb2eead
Your Name authored Mar 19, 2026

afb2eead
Improve error handling for incomplete diffusers models · 6eb3b9b8
Your Name authored Mar 19, 2026

6eb3b9b8
Enhance debug output to show --ggg and --tools-closer-prompt flags · 17205499
Your Name authored Mar 19, 2026

17205499
Fix order - import from text.py before calling set_global_args · 28b8fa4d
Your Name authored Mar 19, 2026

28b8fa4d
Fix global_args not being set - import set_global_args from text.py · 5b5776d2
Your Name authored Mar 19, 2026

5b5776d2
Add missing format_tools_for_prompt and cleanup_control_tokens imports · b99ef5db
Your Name authored Mar 19, 2026

b99ef5db
Fix missing global_system_prompt getter in text.py · fee1c415
Your Name authored Mar 19, 2026

fee1c415