Commits · f4a34bc3e37abf3a95412e8ee5615da21739a287 · nexlab / coderai

03 May, 2026 9 commits

Add --flash flag to build.sh for Flash Attention 2 installation · f4a34bc3

Stefy Lanza (nextime / spora ) authored May 03, 2026

- New --flash flag enables Flash Attention 2 installation
- Works with nvidia and all backends (when CUDA available)
- Installs with --no-build-isolation flag
- Graceful error handling if installation fails
- Updated usage instructions to show --flash-attn flag
- Requirements: CUDA 11.6+, Linux, Ampere/Ada/Hopper GPU

f4a34bc3

Add implementation summary document · 210cc32a

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Complete overview of refactoring work
- Architecture before/after comparison
- List of all commits and changes
- Installation and usage instructions
- Configuration examples
- Testing checklist and known limitations
- Next steps for future work

210cc32a

Improve build.sh error handling and force venv installation · acc1fbdf

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Added PIP_NO_INPUT and PIP_REQUIRE_VIRTUALENV environment variables
- Better error handling for problematic packages (procname, stable-diffusion-cpp-python)
- Install core dependencies first, then optional packages individually
- Graceful fallback when optional packages fail to build
- Updated PyTorch installation to handle version compatibility
- Clear warnings for optional packages that fail (not critical errors)

acc1fbdf

Fix Python 3.13 compatibility for stable-diffusion-cpp-python and procname · 1a740475

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Removed version pins from stable-diffusion-cpp-python and procname
- Both packages have build issues with strict version requirements on Python 3.13
- Allowing pip to find compatible versions automatically

1a740475

Fix whispercpp version requirement · a5ac005e

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Changed whispercpp from >=1.0.0 to >=0.0.6
- Package only has versions 0.0.x available, no 1.0.0 exists yet

a5ac005e

Fix torch version requirement for Python 3.13 compatibility · e780fb00

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Updated torch requirement from ==2.0.0 to >=2.5.0
- Python 3.13 requires torch 2.5.0 or newer
- Added torchvision and torchaudio without version pins

e780fb00

Phase 3: Integrate config-driven loading and admin dashboard · dcad925d

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Updated main.py to use ConfigManager for loading settings
- Models now loaded from config instead of CLI arguments
- Admin dashboard routes integrated into FastAPI app
- Added admin API endpoints for tokens management
- Added admin models management endpoints
- System config reload endpoint
- Static files mounted at /static/admin
- Admin UI available at /admin

dcad925d

Phase 2: Admin dashboard with auth and templates · 6f81dfe2

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Implement SessionManager with cookie-based authentication
- Create admin routes with login, logout, password change
- Add dashboard, models, tokens, users, and chat pages
- Implement dark theme CSS with modern design
- Add user management (create, delete users)
- Add API token management placeholders
- Session-based auth with CSRF protection
- Password hashing with argon2 fallback
- Jinja2 templates for all admin pages

6f81dfe2

Phase 1: Configuration foundation - move CLI to JSON config · 1d457be7

Stefy Lanza (nextime / spora ) authored May 03, 2026

- Refactor cli.py to only support --debug and --config options
- Create ConfigManager class for loading/saving JSON configs
- Implement per-model configuration approach in models.json
- Create comprehensive design document for admin dashboard
- Set up admin package structure
- All model-specific settings now stored per-model instead of global defaults

1d457be7

20 Mar, 2026 9 commits

Fix offload-strategy parameter passing to CUDA backend · bf1d3f52

Your Name authored Mar 20, 2026

- Add offload_strategy to kwargs in _load_default_model and _load_model_by_name
- Fix parameter name: ram -> manual_ram_gb to match backend expectation
- Also pass load_in_4bit, load_in_8bit, and max_gpu_percent

bf1d3f52

Add --offload-strategy none to disable CPU offloading and VRAM auto-detection · beded066

Your Name authored Mar 20, 2026

- Add 'none' to --offload-strategy choices in cli.py
- In cuda.py backend:
  - _get_vram_percentages_for_strategy() returns None for 'none' strategy
  - _get_vram_percentages_for_gpu() skips VRAM detection for 'none'
  - load_model() loads directly on GPU without max_memory constraints
- Add startup status message in main.py for --offload-strategy none

beded066

Add --no-ram option to maximize VRAM usage · b782a092

Your Name authored Mar 20, 2026

- Add --no-ram CLI option to force model loading without CPU RAM spilling
- Implement --no-ram behavior for:
  - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx
  - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True
  - Diffusers: force full GPU loading
  - sd.cpp: maximize GPU usage
- Propagate flag through model manager
- Add startup banner message

b782a092

API: validate requested models against CLI-registered models · ef949827

Your Name authored Mar 20, 2026

- Add get_all_allowed_identifiers() to MultiModelManager returning all valid
  model identifiers (default model + short name + aliases, audio, tts, image,
  vision models, and custom aliases)
- Rewrite is_allowed_model() to check against the full allowed set with
  support for prefixed forms and short-name matching
- Add validation in request_model() that rejects unknown models with an error
  message listing all available models
- Fix get_model_for_request() to reject loading arbitrary models not in the
  allowed set
- Update all API endpoints (text, images, tts, transcriptions) to check for
  the error key and return HTTP 404 when a disallowed model is requested

ef949827

Fix --download-model for non-GGUF HuggingFace models · b0a633c7

Your Name authored Mar 20, 2026

- Try GGUF pattern first for HuggingFace model IDs
- Fall back to snapshot_download for entire repo (transformers/diffusers models)
- Works for both GGUF models and full HuggingFace repos

b0a633c7

Try to fix · aacd990a
Your Name authored Mar 20, 2026

aacd990a

Simplify --download-model: use cache module directly · fe7a30dc

Your Name authored Mar 20, 2026

- Remove auto-detection logic, just use download_model from cache
- User can specify --download-file-pattern for non-GGUF models

fe7a30dc

Improve --download-model auto-detection for non-GGUF HF models · 01bdfe14

Your Name authored Mar 20, 2026

- Scan HuggingFace repo to detect available file patterns
- Try multiple patterns (.gguf, .safetensors, .bin, .pt, .pth)
- Default to .gguf if nothing found

01bdfe14

Add --download-model CLI argument to download models to cache and exit · a49d1d88

Your Name authored Mar 20, 2026

- Add --download-model argument to download a model (URL or HuggingFace ID) to cache
- Add --download-file-pattern argument to specify file pattern for HF downloads
- Use download_model from codai.models.cache module
- Model downloads to appropriate cache and exits without starting server

a49d1d88

19 Mar, 2026 22 commits

Fix global_args not propagated to state module · 8512c7db