- 06 May, 2026 17 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 05 May, 2026 4 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 03 May, 2026 10 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- New --flash flag enables Flash Attention 2 installation - Works with nvidia and all backends (when CUDA available) - Installs with --no-build-isolation flag - Graceful error handling if installation fails - Updated usage instructions to show --flash-attn flag - Requirements: CUDA 11.6+, Linux, Ampere/Ada/Hopper GPU
-
Stefy Lanza (nextime / spora ) authored
- Complete overview of refactoring work - Architecture before/after comparison - List of all commits and changes - Installation and usage instructions - Configuration examples - Testing checklist and known limitations - Next steps for future work
-
Stefy Lanza (nextime / spora ) authored
- Added PIP_NO_INPUT and PIP_REQUIRE_VIRTUALENV environment variables - Better error handling for problematic packages (procname, stable-diffusion-cpp-python) - Install core dependencies first, then optional packages individually - Graceful fallback when optional packages fail to build - Updated PyTorch installation to handle version compatibility - Clear warnings for optional packages that fail (not critical errors)
-
Stefy Lanza (nextime / spora ) authored
- Removed version pins from stable-diffusion-cpp-python and procname - Both packages have build issues with strict version requirements on Python 3.13 - Allowing pip to find compatible versions automatically
-
Stefy Lanza (nextime / spora ) authored
- Changed whispercpp from >=1.0.0 to >=0.0.6 - Package only has versions 0.0.x available, no 1.0.0 exists yet
-
Stefy Lanza (nextime / spora ) authored
- Updated torch requirement from ==2.0.0 to >=2.5.0 - Python 3.13 requires torch 2.5.0 or newer - Added torchvision and torchaudio without version pins
-
Stefy Lanza (nextime / spora ) authored
- Updated main.py to use ConfigManager for loading settings - Models now loaded from config instead of CLI arguments - Admin dashboard routes integrated into FastAPI app - Added admin API endpoints for tokens management - Added admin models management endpoints - System config reload endpoint - Static files mounted at /static/admin - Admin UI available at /admin
-
Stefy Lanza (nextime / spora ) authored
- Implement SessionManager with cookie-based authentication - Create admin routes with login, logout, password change - Add dashboard, models, tokens, users, and chat pages - Implement dark theme CSS with modern design - Add user management (create, delete users) - Add API token management placeholders - Session-based auth with CSRF protection - Password hashing with argon2 fallback - Jinja2 templates for all admin pages
-
Stefy Lanza (nextime / spora ) authored
- Refactor cli.py to only support --debug and --config options - Create ConfigManager class for loading/saving JSON configs - Implement per-model configuration approach in models.json - Create comprehensive design document for admin dashboard - Set up admin package structure - All model-specific settings now stored per-model instead of global defaults
-
- 20 Mar, 2026 9 commits
-
-
Your Name authored
- Add offload_strategy to kwargs in _load_default_model and _load_model_by_name - Fix parameter name: ram -> manual_ram_gb to match backend expectation - Also pass load_in_4bit, load_in_8bit, and max_gpu_percent
-
Your Name authored
- Add 'none' to --offload-strategy choices in cli.py - In cuda.py backend: - _get_vram_percentages_for_strategy() returns None for 'none' strategy - _get_vram_percentages_for_gpu() skips VRAM detection for 'none' - load_model() loads directly on GPU without max_memory constraints - Add startup status message in main.py for --offload-strategy none
-
Your Name authored
- Add --no-ram CLI option to force model loading without CPU RAM spilling - Implement --no-ram behavior for: - llama-cpp-python: n_gpu_layers=-1, use_mmap=False, ignore --n-ctx - HuggingFace transformers: device_map='cuda:0', low_cpu_mem_usage=True - Diffusers: force full GPU loading - sd.cpp: maximize GPU usage - Propagate flag through model manager - Add startup banner message
-
Your Name authored
- Add get_all_allowed_identifiers() to MultiModelManager returning all valid model identifiers (default model + short name + aliases, audio, tts, image, vision models, and custom aliases) - Rewrite is_allowed_model() to check against the full allowed set with support for prefixed forms and short-name matching - Add validation in request_model() that rejects unknown models with an error message listing all available models - Fix get_model_for_request() to reject loading arbitrary models not in the allowed set - Update all API endpoints (text, images, tts, transcriptions) to check for the error key and return HTTP 404 when a disallowed model is requested
-
Your Name authored
- Try GGUF pattern first for HuggingFace model IDs - Fall back to snapshot_download for entire repo (transformers/diffusers models) - Works for both GGUF models and full HuggingFace repos
-
Your Name authored
-
Your Name authored
- Remove auto-detection logic, just use download_model from cache - User can specify --download-file-pattern for non-GGUF models
-
Your Name authored
- Scan HuggingFace repo to detect available file patterns - Try multiple patterns (.gguf, .safetensors, .bin, .pt, .pth) - Default to .gguf if nothing found
-
Your Name authored
- Add --download-model argument to download a model (URL or HuggingFace ID) to cache - Add --download-file-pattern argument to specify file pattern for HF downloads - Use download_model from codai.models.cache module - Model downloads to appropriate cache and exits without starting server
-