wip: snapshot in-progress platform updates

parent 8fd1c5c2
# CoderAI
An OpenAI-compatible API server with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and multi-modal support (text, image, audio, TTS).
An OpenAI-compatible API server with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support.
## Features
### Core Capabilities
- **OpenAI-Compatible API**: Drop-in replacement for OpenAI's API endpoints
- **Web Admin Dashboard**: Modern UI for model management, user authentication, and API tokens
- **Configuration-Based**: JSON config files for all settings - no complex CLI arguments
- **Multi-Modal Support**: Text generation, image generation, audio transcription, text-to-speech
- **Web Studio**: Modern UI for all generation tasks — chat, image, video, audio, pipelines
- **Configuration-Based**: JSON config files for all settings no complex CLI arguments
- **Multi-Modal**: Text, image, video, audio, TTS, STT, embeddings
- **Per-Model Configuration**: Individual settings for each model (GPU layers, quantization, context size)
- **On-Demand Loading**: Models load automatically when requested, unload when idle
......@@ -19,6 +19,48 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
- **Auto-Detection**: Automatically selects best available backend
- **Multi-GPU**: Automatic distribution across multiple devices
### Image Generation
- **Text-to-Image**: Stable Diffusion, SDXL, Flux, and GGUF image models (via stable-diffusion.cpp)
- **Image-to-Image**: Style transfer and image editing
- **Inpainting**: Fill masked regions with AI-generated content
- **Upscaling**: Real-ESRGAN super-resolution (2×/4×/8×)
- **Deblur**: Wiener deconvolution + unsharp masking
- **Unpixelate**: Real-ESRGAN restoration of pixelated/compressed images
- **Outfit Change**: Auto-generated clothing mask + inpainting for wardrobe changes
- **Face Swap**: InsightFace INSwapper — swap faces in images and videos
- **Depth Estimation**: Monocular depth maps
- **Segmentation**: SAM-based object segmentation
### Video Generation
- **Text-to-Video**: Generate video from text prompts
- **Image-to-Video**: Animate a still image
- **Video-to-Video**: Transform existing video
- **Ti2V**: Text + image → video with camera motion control
- **Frame Interpolation**: Increase FPS via RIFE or ffmpeg minterpolate
- **Upscaling**: Real-ESRGAN video upscaling
- **Subtitles**: Whisper transcription + optional translation + burn-in
- **Dubbing**: Transcribe → translate → TTS → replace audio track
### Audio
- **Text-to-Speech**: Kokoro TTS with voice selection and speed control
- **Speech-to-Text**: Whisper transcription (faster-whisper / whispercpp)
- **Music/SFX Generation**: MusicGen, AudioGen, AudioLDM2
- **Voice Cloning**: F5-TTS zero-shot voice cloning from a reference audio clip
- **Voice Conversion (SVC)**: Seed-VC — converts timbre while preserving pitch, melody and expression; **singing mode** for music
- **Voice Profiles**: Save named voice profiles (reference audio + transcript) for reuse
### Pipelines
Built-in multi-step pipelines callable from the API or web UI:
| Endpoint | Description |
|---|---|
| `POST /v1/pipelines/image-to-video` | Generate image → animate → optional audio |
| `POST /v1/pipelines/video-dub` | Transcribe → translate → TTS dub → burn subtitles |
| `POST /v1/pipelines/story` | LLM script → images per scene → video → TTS narration |
| `POST /v1/pipelines/audio-dub` | Transcribe audio/video → translate → clone voice → replace audio |
**Custom Pipeline Builder**: Create, save and run your own multi-step pipelines from the web UI or API. Chain any combination of 18 step types with `{{input}}` and `{{stepN.output}}` template variables.
### Advanced Features
- **Memory Management**: Smart VRAM → RAM → Disk offloading (NVIDIA)
- **Quantization**: 4-bit/8-bit via bitsandbytes (NVIDIA) or GGUF quantization (Vulkan)
......@@ -26,6 +68,9 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
- **Streaming**: Server-sent events for real-time token generation
- **Tool Calling**: Function calling and tool use support
- **Authentication**: Session-based auth with API token support
- **Webcam/Microphone**: Capture directly from browser for face swap and voice cloning
---
## Installation
......@@ -36,974 +81,322 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
- For AMD/Intel GPUs (Vulkan): Vulkan drivers and SDK
- For CPU-only: No additional requirements
**Note**: The Vulkan backend works with:
- AMD GPUs (RX 400 series and newer) - **Recommended**
- Intel integrated GPUs (HD 600 series and newer) and Intel Arc GPUs
- NVIDIA GPUs (GTX 900 series and newer) - *CUDA backend preferred*
Any GPU with Vulkan 1.2+ driver support should work with the Vulkan backend.
### Quick Install with Build Script
The easiest way to install is using the provided build script:
```bash
# Clone the repository
git clone git@git.nexlab.net:nexlab/coderai.git
cd coderai
# Install all backends (recommended)
./build.sh all
# Or install specific backend:
./build.sh nvidia # NVIDIA GPUs only
./build.sh vulkan # AMD/Intel GPUs only
./build.sh all # All backends (recommended)
./build.sh nvidia # NVIDIA only
./build.sh vulkan # AMD/Intel only
```
**Note**: The `all` option installs support for all backends, allowing you to switch between them via configuration. The `vulkan` option works for both AMD and Intel GPUs.
The build script will:
- Create a virtual environment
- Install the appropriate dependencies for your GPU
- Set up the correct backend(s)
The build script creates a virtual environment, installs dependencies, and builds GPU-accelerated backends including `stable-diffusion-cpp-python` with CUDA+Vulkan support.
### Manual Installation
If you prefer manual installation:
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate
# For NVIDIA GPUs
# NVIDIA
pip install torch torchvision torchaudio
pip install -r requirements-nvidia.txt
# For AMD GPUs with Vulkan
# AMD/Intel (Vulkan)
CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir
pip install -r requirements-vulkan.txt
```
### Platform-Specific Requirements
#### NVIDIA (CUDA)
Requires:
- NVIDIA GPU with CUDA support
- CUDA toolkit (11.8+ or 12.1+)
- PyTorch with CUDA
Models: HuggingFace format (safetensors/pytorch)
#### AMD and Intel (Vulkan)
Requires:
- GPU with Vulkan 1.2+ support:
- AMD: RX 400 series and newer (recommended)
- Intel: HD 600 series integrated graphics or newer, Intel Arc GPUs
- NVIDIA: GTX 900 series and newer (but CUDA backend preferred)
- Vulkan drivers and SDK
**Install Vulkan drivers and tools:**
```bash
# Debian/Ubuntu
sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslc glslang-tools glslang-dev
# Fedora
sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers glslang
# Arch Linux
sudo pacman -S vulkan-headers vulkan-icd-loader vulkan-radeon glslang
```
**Note:** The shader compiler `glslc` is required to build llama-cpp-python with Vulkan support. On Debian/Ubuntu, it's provided by the `glslc` package. If `glslc` is not found after installing, try:
### Stable Diffusion GGUF (CUDA + Vulkan)
```bash
# Check if glslc exists somewhere
find /usr -name "glslc" 2>/dev/null
# If found in a non-standard location, add to PATH
export PATH=$PATH:/usr/lib/shaderc/bin
# Or create a symlink if glslangValidator exists
sudo ln -s $(which glslangValidator) /usr/local/bin/glslc
CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
pip install stable-diffusion-cpp-python --no-cache-dir --force-reinstall
```
Models: GGUF format (from HuggingFace or local files)
**Note**: The Vulkan backend uses llama-cpp-python with GGUF models, which provides excellent performance on AMD and Intel GPUs without requiring vendor-specific SDKs (ROCm/OneAPI).
### Optional Dependencies
#### bitsandbytes (Quantization)
For 4-bit and 8-bit quantization support (reduces VRAM requirements):
### Voice Cloning and Voice Conversion
```bash
# CUDA
pip install "bitsandbytes>=0.41.0"
# ROCm support may require building from source
# See: https://github.com/TimDettmers/bitsandbytes
pip install f5-tts # Voice cloning (F5-TTS)
pip install seed-vc # Voice conversion / singing SVC
```
#### Flash Attention 2
For significantly faster inference on supported GPUs (requires specific CUDA/ROCm versions):
### Face Swap
```bash
# Requires CUDA 11.6+ or ROCm 5.4+
pip install flash-attn --no-build-isolation
pip install insightface onnxruntime-gpu
# inswapper_128.onnx downloads automatically on first use
```
**Note**: Flash Attention 2 requires:
- CUDA 11.6+ or ROCm 5.4+
- Linux OS (Windows support is experimental)
- Specific GPU architectures (Ampere, Ada Lovelace, Hopper for NVIDIA)
---
## Usage
### Quick Start
```bash
# Activate the virtual environment
source venv_all/bin/activate # or venv/bin/activate
# Start the server (uses default config at ~/.coderai/)
python coderai
source venv_all/bin/activate
# Or specify a custom config directory
python coderai --config /path/to/config
# Enable debug mode for troubleshooting
python coderai --debug
python coderai # Default config at ~/.coderai/
python coderai --config /path/to/cfg # Custom config directory
python coderai --debug # Debug mode
```
The server will start on `http://0.0.0.0:8000` by default.
Server starts on `http://0.0.0.0:8000`.
### Access Points
- **Admin Dashboard**: http://localhost:8000/admin
- **Chat Interface**: http://localhost:8000/chat
- **API Endpoints**: http://localhost:8000/v1/*
- **API Documentation**: http://localhost:8000/docs
| URL | Description |
|---|---|
| `http://localhost:8000/admin` | Admin dashboard |
| `http://localhost:8000/chat` | Web Studio (generation UI) |
| `http://localhost:8000/v1/*` | OpenAI-compatible API |
| `http://localhost:8000/docs` | Interactive API docs |
### First Login
Default credentials: `admin` / `admin` (prompted to change on first login).
Default credentials (you'll be prompted to change the password):
- **Username**: `admin`
- **Password**: `admin`
---
### Configuration Files
## Configuration
CoderAI uses JSON configuration files stored in `~/.coderai/` (or custom directory via `--config`):
Config files live in `~/.coderai/` (or `--config` path):
```
~/.coderai/
├── config.json # Server, backend, and global settings
├── models.json # Model registry and per-model configurations
├── auth.json # Users, API tokens, and sessions
├── config.json # Server, backend, global settings
├── models.json # Model registry and per-model config
├── auth.json # Users, API tokens, sessions
├── pipelines.json # Custom pipeline definitions
└── secret_key # Session signing key (auto-generated)
```
These files are automatically created with sensible defaults on first run.
### Command-Line Options
```
usage: coderai [-h] [--config CONFIG] [--debug] [--dump]
[--list-cached-models] [--remove-all-models]
[--remove-model REMOVE_MODEL] [--download-model DOWNLOAD_MODEL]
[--download-file-pattern DOWNLOAD_FILE_PATTERN]
[--vulkan-list-devices]
OpenAI-compatible API server supporting NVIDIA (CUDA) and Vulkan backends
options:
-h, --help show this help message and exit
--config CONFIG Configuration directory (default: ~/.coderai/)
--debug Enable debug mode - dumps full request/response to stdout
--dump Dump model output: raw output, parsed output, and debug info
--list-cached-models List all cached models in the model cache directory
--remove-all-models Remove all cached models from the model cache directory
--remove-model NAME Remove a specific cached model by name or hash
--download-model ID Download a model to cache (URL or HuggingFace model ID)
--download-file-pattern PATTERN
File pattern for HuggingFace downloads (e.g., .gguf, .safetensors)
--vulkan-list-devices List available Vulkan GPU devices and exit
```
## API Documentation
The API is compatible with OpenAI's REST API. Interactive documentation is available at `http://localhost:8000/docs` when the server is running.
### Endpoints
| Endpoint | Description |
|----------|-------------|
| `GET /v1/models` | List available models |
| `POST /v1/chat/completions` | Chat completions (ChatGPT-style) |
| `POST /v1/completions` | Text completions (GPT-style) |
### Example curl Commands
#### List Models
```bash
curl http://localhost:8000/v1/models
```
#### Chat Completion (Non-Streaming)
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 150
}'
```
#### Chat Completion (Streaming)
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true,
"max_tokens": 200
}'
```
#### Text Completion
```bash
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.8
}'
```
#### Chat Completion with Tools
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "user", "content": "What is the weather in Paris?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
}'
```
## Configuration
### Configuration Files
All settings are managed through JSON files in the configuration directory (`~/.coderai/` by default):
#### config.json - Server and Backend Settings
### config.json
```json
{
"server": {
"host": "0.0.0.0",
"port": 8000,
"https": false,
"https_key_path": null,
"https_cert_path": null
},
"backend": {
"type": "auto",
"image_backend": "auto",
"audio_backend": "auto",
"tts_backend": "auto"
},
"models": {
"default_load_mode": "ondemand",
"hf_cache_dir": null,
"gguf_cache_dir": null
},
"offload": {
"directory": "./offload",
"strategy": "auto",
"max_gpu_percent": null,
"no_ram": false,
"load_in_4bit": false,
"load_in_8bit": false,
"manual_ram_gb": null,
"flash_attention": false
},
"vulkan": {
"n_gpu_layers": -1,
"n_ctx": 2048,
"device_id": 0,
"single_gpu": false
},
"image": {
"steps": 4,
"width": 512,
"height": 512,
"cfg_scale": 1.0,
"precision": "f32",
"cpu_offload": false
},
"whisper": {
"server_path": null,
"server_port": 8744
}
"server": { "host": "0.0.0.0", "port": 8000 },
"backend": { "type": "auto" },
"models": { "default_load_mode": "ondemand" },
"offload": { "load_in_4bit": false, "flash_attention": false },
"vulkan": { "n_gpu_layers": -1, "n_ctx": 2048, "device_id": 0 }
}
```
#### models.json - Model Registry
### models.json
```json
{
"text_models": [
{
"id": "microsoft/DialoGPT-medium",
"backend": "nvidia",
"context_size": 2048,
"n_gpu_layers": -1,
"load_in_4bit": false,
"load_in_8bit": false,
"flash_attention": false,
"enabled": true
},
{
"id": "phi-3-mini-4k-instruct-q4_k_m.gguf",
"backend": "vulkan",
"context_size": 4096,
"n_gpu_layers": -1,
"enabled": true
}
],
"image_models": [
{
"id": "stable-diffusion-xl-base-1.0",
"backend": "nvidia",
"steps": 4,
"width": 512,
"height": 512,
"cfg_scale": 1.0,
"enabled": true
}
],
"text_models": [{ "id": "Qwen/Qwen3.5-9B", "backend": "nvidia", "enabled": true }],
"image_models": [{ "id": "z_image_turbo-Q2_K.gguf", "backend": "auto", "enabled": true }],
"tts_models": [{ "id": "kokoro-v1.0.onnx", "enabled": true }],
"audio_models": [],
"vision_models": [],
"tts_models": [],
"loaded": [],
"preload": [],
"aliases": {
"default": "microsoft/DialoGPT-medium"
}
}
```
#### auth.json - Users and API Tokens
```json
{
"users": [
{
"id": "admin",
"username": "admin",
"password_hash": "$argon2id$...",
"role": "admin",
"created_at": "2026-05-05T00:00:00Z"
}
],
"tokens": [
{
"id": "tok_abc123",
"token": "sk-coderai-abc123...",
"name": "Production API",
"created_at": "2026-05-05T00:00:00Z",
"last_used": null
}
],
"sessions": {}
"video_models": []
}
```
### Managing Configuration
#### Via Web Dashboard
The easiest way to manage configuration is through the web dashboard at `http://localhost:8000/admin`:
- **Models**: Add, remove, enable/disable models; configure per-model settings
- **Users**: Create users, change passwords, manage roles
- **Tokens**: Generate API tokens for programmatic access
- **Settings**: Adjust server, backend, and global settings
#### Via Configuration Files
You can also edit the JSON files directly. Changes take effect after restarting the server or using the reload endpoint:
```bash
curl -X POST http://localhost:8000/admin/api/system/reload
```
### Per-Model Configuration
Each model can have its own settings that override global defaults:
**Text Models (NVIDIA backend):**
- `backend`: "nvidia" or "vulkan"
- `context_size`: Context window size
- `n_gpu_layers`: Number of layers on GPU (-1 = all)
- `load_in_4bit`: Enable 4-bit quantization
- `load_in_8bit`: Enable 8-bit quantization
- `flash_attention`: Enable Flash Attention 2
**Text Models (Vulkan backend):**
- `backend`: "vulkan"
- `context_size`: Context window size
- `n_gpu_layers`: Number of layers on GPU (-1 = all)
**Image Models:**
- `backend`: "nvidia" or "vulkan"
- `steps`: Number of diffusion steps
- `width`: Image width
- `height`: Image height
- `cfg_scale`: Classifier-free guidance scale
- `precision`: "f32" or "f16"
### Backend Selection
---
Backends can be configured globally in `config.json` or per-model in `models.json`:
## API Reference
- **`auto`**: Automatically detect and use best available backend
- **`nvidia`**: Use CUDA backend (PyTorch + Transformers)
- **`vulkan`**: Use Vulkan backend (llama-cpp-python)
### Text
### Model Loading Modes
| Endpoint | Description |
|---|---|
| `GET /v1/models` | List available models |
| `POST /v1/chat/completions` | Chat completions (streaming supported) |
| `POST /v1/completions` | Text completions |
| `POST /v1/embeddings` | Text embeddings |
Configure in `config.json` under `models.default_load_mode`:
### Image
- **`ondemand`** (default): Load models when first requested, unload when idle
- **`preload`**: Load models listed in `models.json``preload` array at startup
- **`lazy`**: Never preload, always load on-demand
| Endpoint | Description |
|---|---|
| `POST /v1/images/generations` | Text-to-image |
| `POST /v1/images/edits` | Image-to-image |
| `POST /v1/images/inpaint` | Inpainting |
| `POST /v1/images/upscale` | Real-ESRGAN upscaling |
| `POST /v1/images/deblur` | Deblur / sharpen |
| `POST /v1/images/unpixelate` | Remove pixelation |
| `POST /v1/images/outfit` | Change clothing/outfit |
| `POST /v1/images/faceswap` | Face swap (image or video) |
| `POST /v1/images/depth` | Depth estimation |
| `POST /v1/images/segment` | Object segmentation |
### Video
## Backend-Specific Setup
| Endpoint | Description |
|---|---|
| `POST /v1/video/generations` | Generate video (t2v/i2v/v2v/ti2v/interp) |
| `POST /v1/video/upscale` | Upscale video |
| `POST /v1/video/subtitle` | Generate/burn subtitles |
| `POST /v1/video/interpolate` | Frame interpolation |
| `POST /v1/video/dub` | Dub video to another language |
### NVIDIA (CUDA)
### Audio
```bash
# Using build script
./build.sh nvidia
| Endpoint | Description |
|---|---|
| `POST /v1/audio/speech` | Text-to-speech |
| `POST /v1/audio/transcriptions` | Speech-to-text (Whisper) |
| `POST /v1/audio/generate` | Music/SFX generation |
| `POST /v1/audio/clone` | Voice cloning TTS (F5-TTS) |
| `POST /v1/audio/convert` | Voice conversion / SVC (Seed-VC) |
| `GET /v1/audio/voices` | List saved voice profiles |
| `POST /v1/audio/voices` | Save a voice profile |
| `DELETE /v1/audio/voices/{name}` | Delete a voice profile |
### Pipelines
# Or manually install CUDA-enabled PyTorch
pip install "torch>=2.0.0" "torchvision>=0.15.0" "torchaudio>=2.0.0"
pip install -r requirements-nvidia.txt
```
| Endpoint | Description |
|---|---|
| `POST /v1/pipelines/image-to-video` | Image gen → video animation |
| `POST /v1/pipelines/video-dub` | Full video dubbing pipeline |
| `POST /v1/pipelines/story` | LLM → images → video → TTS |
| `POST /v1/pipelines/audio-dub` | Audio/video dub with voice cloning |
| `GET /v1/pipelines/custom` | List custom pipelines |
| `POST /v1/pipelines/custom` | Create custom pipeline |
| `PUT /v1/pipelines/custom/{id}` | Update custom pipeline |
| `DELETE /v1/pipelines/custom/{id}` | Delete custom pipeline |
| `POST /v1/pipelines/custom/{id}/run` | Run a saved custom pipeline |
| `POST /v1/pipelines/run` | Run an inline pipeline definition |
| `GET /v1/pipelines/step-types` | List available step types |
### Custom Pipeline Definition
**Configuration in models.json:**
```json
{
"text_models": [
"name": "My Pipeline",
"steps": [
{
"id": "meta-llama/Llama-2-7b-chat-hf",
"backend": "nvidia",
"context_size": 4096,
"n_gpu_layers": -1,
"load_in_4bit": false,
"load_in_8bit": false,
"flash_attention": false,
"enabled": true
"type": "text_gen",
"label": "Write scene description",
"params": {
"model": "Qwen/Qwen3.5-9B",
"prompt": "Describe a visual scene for: {{input}}"
}
]
}
```
### AMD and Intel (Vulkan)
```bash
# Install Vulkan drivers first
# Debian/Ubuntu (AMD and Intel):
sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers intel-media-va-driver
# Fedora:
sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers intel-gpu-tools
# Using build script
./build.sh vulkan
# List available Vulkan GPU devices
python coderai --vulkan-list-devices
```
**Vulkan Backend Notes:**
- Uses GGUF format models (much smaller than full HuggingFace models)
- Q4_K_M quantization recommended for 4GB+ VRAM GPUs
- Q5_K_M or Q6_K for higher quality
- Works on:
- AMD RX 400 series and newer (**recommended**)
- Intel integrated graphics (HD 600 series+) and Intel Arc GPUs
- NVIDIA GTX 900 series and newer (but CUDA backend is preferred)
- Any GPU with Vulkan 1.2+ driver support should work
- **Update llama-cpp-python** for newer model support: `pip install --upgrade llama-cpp-python --no-cache-dir`
**Intel GPU Specific Notes:**
- Intel integrated GPUs have limited VRAM (shared with system RAM), so use smaller models
- Recommended for Intel iGPUs: `Q4_K_M` quantized models under 2GB file size
- Intel Arc GPUs work well with the same settings as AMD GPUs
**Configuration in models.json:**
```json
{
"text_models": [
},
{
"type": "image_gen",
"params": {
"model": "z_image_turbo-Q2_K.gguf",
"prompt": "{{step0.output}}"
}
},
{
"id": "phi-3-mini-4k-instruct-q4_k_m.gguf",
"backend": "vulkan",
"context_size": 4096,
"n_gpu_layers": -1,
"enabled": true
"type": "video_gen",
"params": {
"model": "wan-model",
"mode": "i2v",
"init_image": "{{step1.url}}"
}
}
]
}
```
**Vulkan Configuration in config.json:**
```json
{
"vulkan": {
"n_gpu_layers": -1,
"n_ctx": 2048,
"device_id": 0,
"single_gpu": false
}
}
```
Template variables: `{{input}}`, `{{stepN.output}}`, `{{stepN.url}}`.
### CPU-Only
Available step types: `text_gen`, `image_gen`, `image_edit`, `image_inpaint`, `image_upscale`, `image_deblur`, `image_unpix`, `image_outfit`, `image_faceswap`, `video_gen`, `video_upscale`, `video_sub`, `video_interp`, `video_dub`, `tts`, `audio_gen`, `voice_clone`, `voice_convert`.
While not recommended for performance, you can run on CPU:
---
```bash
# NVIDIA backend on CPU
pip install "torch>=2.0.0" --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements-nvidia.txt
## Backend-Specific Notes
# Or Vulkan backend on CPU (llama-cpp supports CPU fallback)
CMAKE_ARGS="-DGGML_VULKAN=OFF" pip install llama-cpp-python
```
### NVIDIA (CUDA)
Configure in `config.json`:
```json
{
"backend": {
"type": "nvidia"
},
"vulkan": {
"n_gpu_layers": 0
}
}
```
- HuggingFace format models (safetensors/pytorch)
- GGUF text models via llama-cpp-python with CUDA
- Stable Diffusion GGUF via stable-diffusion.cpp with CUDA
- Optional: bitsandbytes (4-bit/8-bit quantization), Flash Attention 2
### ROCm Alternative (deprecated)
### AMD / Intel (Vulkan)
While the Vulkan backend is now recommended for AMD GPUs, ROCm support is still available through the NVIDIA backend if you have ROCm-enabled PyTorch installed.
- GGUF format models via llama-cpp-python with Vulkan
- Stable Diffusion GGUF via stable-diffusion.cpp with Vulkan
- No ROCm/OneAPI required
- Intel iGPUs: use Q4_K_M models under 2GB
### Low VRAM Configuration
### Multi-GPU (NVIDIA + AMD)
For GPUs with limited VRAM (4-8GB), configure in `config.json` or per-model in `models.json`:
To force Vulkan to use only the AMD GPU:
**Global configuration (config.json):**
```json
{
"offload": {
"load_in_4bit": true,
"directory": "/path/to/fast/storage"
}
}
{ "vulkan": { "device_id": 1, "single_gpu": true } }
```
**Per-model configuration (models.json):**
### Low VRAM
```json
{
"text_models": [
{
"id": "meta-llama/Llama-2-7b-chat-hf",
"backend": "nvidia",
"load_in_4bit": true,
"enabled": true
}
]
}
{ "offload": { "load_in_4bit": true } }
```
### Using Vulkan with Multiple GPUs (NVIDIA + AMD)
---
If your system has both NVIDIA and AMD GPUs, llama.cpp's Vulkan backend will automatically distribute layers across all visible GPUs for performance. To force Vulkan to use **only** the AMD GPU and prevent VRAM allocation on the NVIDIA GPU, configure in `config.json`:
## Troubleshooting
**Configuration in config.json:**
```json
{
"vulkan": {
"device_id": 1,
"single_gpu": true
}
}
```
### numpy ABI mismatch after installing new packages
**Alternative: Environment variables**
```bash
# List available Vulkan devices first
python coderai --vulkan-list-devices
# Then use VK_DEVICE_SELECT_DEVICE to force a specific device
# For example, if device 1 is your AMD GPU:
VK_DEVICE_SELECT_DEVICE=1 python coderai
# Or hide NVIDIA GPU from CUDA (prevents any CUDA usage)
CUDA_VISIBLE_DEVICES="" python coderai
pip install --force-reinstall --no-cache-dir --no-deps realesrgan insightface
```
**Understanding the Issue:**
When you have multiple Vulkan-compatible GPUs, llama.cpp automatically distributes model layers across them (shown in logs as "layer X assigned to device VulkanY"). The `single_gpu: true` setting prevents this by using the `tensor_split` parameter with a value of `[0.0, 1.0]` (or similar depending on device count), which tells llama.cpp to put 0% of layers on some GPUs and 100% on the selected GPU.
### stable-diffusion.cpp: "get sd version from file failed"
**Notes:**
- The `device_id` setting maps to `main_gpu` in llama-cpp-python
- The `single_gpu` flag builds a `tensor_split` array to force single GPU usage
- Vulkan enumerates all GPUs in your system, so device IDs may differ from CUDA device IDs
- The `vulkaninfo` command shows all GPUs visible to Vulkan
### Multi-GPU Setup
Multiple GPUs are automatically detected and utilized. The model will be distributed across available devices based on memory availability.
The model architecture is not recognized. Update stable-diffusion-cpp-python:
```bash
# Set visible GPUs (optional)
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Run - model will be distributed across all visible GPUs
python coderai
CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
pip install stable-diffusion-cpp-python --upgrade --no-cache-dir
```
## Model Recommendations
### NVIDIA Backend (HuggingFace Models)
#### Small Models (For Testing)
- `microsoft/DialoGPT-medium` (~345M parameters)
- `TinyLlama/TinyLlama-1.1B-Chat-v1.0` (~1.1B parameters)
- `facebook/blenderbot-400M-distill` (~400M parameters)
#### Medium Models (4-8GB VRAM with 4-bit)
- `meta-llama/Llama-2-7b-chat-hf` (~7B parameters)
- `mistralai/Mistral-7B-Instruct-v0.2` (~7B parameters)
- `HuggingFaceH4/zephyr-7b-beta` (~7B parameters)
#### Large Models (Multiple GPUs or High VRAM)
- `meta-llama/Llama-2-13b-chat-hf` (~13B parameters)
- `meta-llama/Llama-2-70b-chat-hf` (~70B parameters) - requires multiple GPUs or disk offload
- `bigscience/bloom-7b1` (~7B parameters)
### Vulkan Backend (GGUF Models)
#### Small Models (2-4GB VRAM)
### stable-diffusion.cpp using CPU instead of GPU
- `TheBloke/phi-2-GGUF` - phi-2.Q4_K_M.gguf (~1.6B parameters, ~1GB file)
- `TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` - tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Reinstall with GPU flags:
#### Medium Models (4-8GB VRAM)
- `TheBloke/Llama-2-7B-GGUF` - llama-2-7b.Q4_K_M.gguf (~4GB file)
- `TheBloke/Mistral-7B-Instruct-v0.2-GGUF` - mistral-7b-instruct-v0.2.Q4_K_M.gguf
- `microsoft/Phi-3-mini-4k-instruct-gguf` - Phi-3-mini-4k-instruct-q4.gguf
#### Large Models (8GB+ VRAM)
- `TheBloke/Llama-2-13B-GGUF` - llama-2-13b.Q4_K_M.gguf (~7.5GB file)
- `TheBloke/deepseek-coder-6.7B-base-GGUF` - deepseek-coder-6.7b-base.Q4_K_M.gguf
**GGUF Quantization Guide:**
- `Q4_K_M` - Best balance of speed/quality (recommended)
- `Q5_K_M` - Higher quality, slightly slower
- `Q6_K` - Near-unquantized quality
- `Q8_0` - Maximum quality, largest size
**Download Example:**
```bash
# Using huggingface-cli
huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_M.gguf --local-dir ./models
# Or let coderai download automatically
python coderai --model TheBloke/Llama-2-7B-GGUF --backend vulkan
CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
pip install stable-diffusion-cpp-python --no-cache-dir --force-reinstall
```
## Troubleshooting
### Shell Redirection Error: "No such file or directory: '0.0'"
**Problem**: Running `pip install torch>=2.0.0` fails with an error about file "0.0" or "=2.0.0" not found.
**Cause**: The shell interprets `>` as output redirection. The command creates a file named "=2.0.0" and installs an unversioned torch package.
**Solutions**:
1. **Use quotes** (recommended): `pip install "torch>=2.0.0"`
2. **Use exact versions**: `pip install torch==2.0.0`
3. **Use requirements.txt**: Add exact versions to requirements.txt and run `pip install -r requirements.txt`
### Out of Memory Errors
**Problem**: `CUDA out of memory` or system RAM exhausted
**Solutions**:
1. Use quantization: `--load-in-4bit` or `--load-in-8bit`
2. Enable disk offload: `--offload-dir /path/to/storage`
3. Use a smaller model
4. Reduce batch size in client requests
### Flash Attention Installation Fails
**Problem**: `pip install flash-attn` fails to build
**Solutions**:
1. Ensure CUDA/ROCm is properly installed
2. Install build dependencies: `pip install packaging ninja`
3. Try without build isolation: `pip install flash-attn --no-build-isolation`
4. Check GPU compatibility (Ampere, Ada Lovelace, Hopper for NVIDIA)
5. Skip Flash Attention - the server works without it
### Flash Attention: No module named 'torch' during build
**Problem**: Flash Attention build fails with `ModuleNotFoundError: No module named 'torch'` even though PyTorch is installed (e.g., PyTorch 2.9.1+rocm6.4).
**Cause**: pip uses isolated build environments by default, which prevents flash-attention from seeing the installed torch package during compilation.
**Solutions**:
1. **Use --no-build-isolation flag** (recommended):
```bash
pip install flash-attn --no-build-isolation
```
2. **For ROCm systems**, you may also need to limit parallel jobs to avoid resource exhaustion:
```bash
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```
3. **Use pre-built wheels** if available for your platform (check https://github.com/Dao-AILab/flash-attention/releases)
4. **ROCm 6.4 compatibility note**: Flash Attention may not officially support ROCm 6.4 yet (it was primarily built for ROCm 6.0). If build fails on ROCm 6.4, you can run without Flash Attention:
```bash
python coderai --model meta-llama/Llama-2-7b-chat-hf
# (omit the --flash-attn flag)
```
5. **Fallback**: The server works perfectly without Flash Attention - simply omit the `--flash-attn` flag when starting the server.
### bitsandbytes Not Working on ROCm
**Problem**: Quantization fails on AMD GPUs
**Solutions**:
1. bitsandbytes has limited ROCm support
2. Use disk offload instead: `--offload-dir /path/to/storage`
3. Build bitsandbytes from source with ROCm support
### Model Download Stuck or Slow
**Problem**: HuggingFace model download is slow or fails
**Solutions**:
1. Set HuggingFace cache directory: `export HF_HOME=/path/to/cache`
2. Use mirror: `export HF_ENDPOINT=https://hf-mirror.com` (for China)
3. Download model manually with `git-lfs` and use local path
### Auto-Detection Issues in Containers
### Vulkan backend not available
**Problem**: Wrong memory detection in Docker/Podman containers
**Solutions**:
1. Specify RAM manually: `--ram 16`
2. Pass through GPU devices properly
3. For Docker: `--gpus all` flag for NVIDIA, or proper device mapping for ROCm
### API Returns 503 Errors
**Problem**: `Model not loaded` error
**Solutions**:
1. Ensure model name is correct and accessible
2. Check model requires authentication: `huggingface-cli login`
3. Verify internet connection for first-time model download
### ROCm Not Detected
**Problem**: ROCm GPU not detected, falling back to CPU
**Solutions**:
1. Verify ROCm installation: `rocminfo`
2. Check PyTorch ROCm build: `python -c "import torch; print(torch.version.hip)"`
3. Set HIP visible devices: `export HIP_VISIBLE_DEVICES=0`
### Import Errors
**Problem**: `ModuleNotFoundError` for various packages
**Solutions**:
1. Reinstall requirements: `pip install -r requirements.txt --force-reinstall`
2. Check Python version: `python --version` (should be 3.8+)
3. Verify virtual environment is activated
### Vulkan-Specific Issues
**Problem**: "Vulkan backend not available" or llama-cpp fails to load
**Solutions**:
1. **Verify Vulkan drivers and shader compiler are installed:**
```bash
# Check Vulkan installation
vulkaninfo | grep "deviceName"
# Check glslc (shader compiler) - REQUIRED for building
glslc --version
# Or install if missing
# Debian/Ubuntu:
sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslang-tools
# Fedora:
sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers glslang
```
**Note:** `glslc` is required to compile llama-cpp-python with Vulkan support. If you see "Could NOT find Vulkan (missing: glslc)", install the `glslc` package:
```bash
sudo apt install glslc glslang-tools glslang-dev
# If glslc still not found, check location and symlink:
find /usr -name "glslc" 2>/dev/null
sudo ln -s /usr/lib/shaderc/bin/glslc /usr/local/bin/glslc 2>/dev/null || sudo ln -s $(which glslangValidator) /usr/local/bin/glslc 2>/dev/null || echo "glslc not found, please install glslc package"
```
2. **Reinstall llama-cpp-python with Vulkan:**
```bash
pip uninstall llama-cpp-python -y
CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir
```
3. **Check GPU compatibility:**
- **AMD**: RX 400 series and newer (best experience)
- **Intel**: HD 600 series integrated graphics or newer, all Intel Arc GPUs
- **NVIDIA**: GTX 900 series and newer (but CUDA backend preferred for NVIDIA)
- Any GPU with Vulkan 1.2+ driver support should work
**Performance expectations by GPU:**
- AMD dedicated GPUs: Full performance, all layer offloading supported
- Intel Arc GPUs: Good performance, similar to AMD
- Intel integrated GPUs: Limited by shared system RAM, use smaller models (Q4_K_M under 2GB)
**Problem**: GGUF model fails to load or produces garbled output
**Solutions**:
1. **Verify model format**: Must be GGUF format, not regular HuggingFace format
```bash
# Check file extension
ls -la model.gguf # Should end in .gguf
```
```bash
# Install Vulkan drivers and shader compiler
sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslc glslang-tools
2. **Try different quantization**: Some GGUF files may be incompatible
- Q4_K_M is most compatible (recommended)
- Q5_K_M or Q6_K for higher quality
- Avoid IQ quants if having issues
# Rebuild llama-cpp-python
CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir --force-reinstall
```
3. **Check model architecture**: Some very new models may need updated llama-cpp
```bash
pip install --upgrade llama-cpp-python
```
### Flash Attention build fails
**Problem**: Vulkan backend runs on CPU instead of GPU
```bash
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```
**Solutions**:
1. **Check layer offloading**: Verify layers are being offloaded
```bash
# Check GPU layers parameter (default -1 = all layers)
python coderai --model model.gguf --backend vulkan --n-gpu-layers 35
```
### Model not loading (503 errors)
2. **Check verbose output**: Look for Vulkan device initialization in logs
```bash
# Run with verbose logging
python coderai --model model.gguf --backend vulkan 2>&1 | grep -i vulkan
```
- Verify model name matches exactly what's in `models.json`
- Check HuggingFace authentication: `huggingface-cli login`
- Ensure the model type matches the endpoint (image models cannot be used via `/v1/chat/completions`)
3. **Verify GPU visibility**: Check that Vulkan sees your GPU
```bash
vulkaninfo | grep -A 5 "GPU0\|GPU1"
```
### Backend Not Detected
**Problem**: "No suitable backend found" error
**Solutions**:
1. **Check which backends are available:**
```bash
python -c "import coderai; print(coderai.detect_available_backends())"
```
2. **For NVIDIA**: Ensure PyTorch with CUDA is installed
```bash
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
```
3. **For Vulkan**: Ensure llama-cpp-python is installed with Vulkan support
```bash
python -c "from llama_cpp import Llama; print('llama-cpp available')"
```
---
## License
This project is licensed under the GNU General Public License v3.0 - see the [LICENSE.md](LICENSE.md) file for details.
GNU General Public License v3.0 — see [LICENSE.md](LICENSE.md).
## Contributing
Contributions are welcome! Please feel free to submit a merge request.
Merge requests welcome.
## Acknowledgments
- Built with [FastAPI](https://fastapi.tiangolo.com/)
- Powered by [HuggingFace Transformers](https://huggingface.co/docs/transformers/) (NVIDIA backend)
- Powered by [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) with Vulkan support (AMD/Intel backend)
- Inspired by the OpenAI API specification
---
**Note on AI.PROMPT**: This project was enhanced following instructions to add Vulkan support for AMD and Intel GPUs alongside the existing NVIDIA/CUDA support. The implementation uses llama-cpp-python for Vulkan/GGUF model support while maintaining full compatibility with the existing HuggingFace/Transformers backend for NVIDIA GPUs.
- [FastAPI](https://fastapi.tiangolo.com/)
- [HuggingFace Transformers](https://huggingface.co/docs/transformers/) — NVIDIA text backend
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) — Vulkan/CUDA GGUF text backend
- [stable-diffusion-cpp-python](https://github.com/william-murray1204/stable-diffusion-cpp-python) — GGUF image backend
- [InsightFace](https://github.com/deepinsight/insightface) — face swap
- [F5-TTS](https://github.com/SWivid/F5-TTS) — voice cloning
- [Seed-VC](https://github.com/Plachta/Seed-VC) — singing voice conversion
- [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) — image/video upscaling
......@@ -522,7 +522,14 @@ elif [ "$BACKEND" = "all" ]; then
pip install setproctitle || echo -e "${YELLOW}Warning: setproctitle failed (optional)${NC}"
# Try stable-diffusion-cpp-python (disable WebM to avoid missing libwebm cmake submodule)
# Use CUDA if available (detected later in this block, check nvcc now)
if command -v nvcc &> /dev/null || [ -d "/usr/local/cuda" ]; then
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || \
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python failed (optional)${NC}"
else
CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || echo -e "${YELLOW}Warning: stable-diffusion-cpp-python failed (optional)${NC}"
fi
}
# Install PyTorch with CUDA support (for nvidia backend)
......@@ -622,14 +629,28 @@ elif [ "$BACKEND" = "all" ]; then
echo -e "${YELLOW}Warning: Some Vulkan packages failed to install${NC}"
}
# Try to install stable-diffusion-cpp-python with OpenCL
if [ "$OPENCL_AVAILABLE" = true ]; then
echo -e "${YELLOW}Installing stable-diffusion-cpp-python with OpenCL support...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || {
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available (requires CMake and build tools)${NC}"
# Try to install stable-diffusion-cpp-python with CUDA+Vulkan (preferred) or fallbacks
if [ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" = true ]; then
echo -e "${YELLOW}Installing stable-diffusion-cpp-python with CUDA+Vulkan support...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON -DSD_VULKAN=ON" pip install stable-diffusion-cpp-python --no-cache-dir || {
echo -e "${YELLOW}CUDA+Vulkan build failed, trying CUDA only...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
}
elif [ "$CUDA_AVAILABLE" = true ]; then
echo -e "${YELLOW}Installing stable-diffusion-cpp-python with CUDA support...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
elif [ "$VULKAN_AVAILABLE" = true ]; then
echo -e "${YELLOW}Installing stable-diffusion-cpp-python with Vulkan support...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_VULKAN=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
elif [ "$OPENCL_AVAILABLE" = true ]; then
echo -e "${YELLOW}Installing stable-diffusion-cpp-python with OpenCL support...${NC}"
CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_OPENCL=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
else
echo -e "${YELLOW}Skipping OpenCL (stable-diffusion-cpp-python) - OpenCL not available${NC}"
echo -e "${YELLOW}Skipping GPU-accelerated stable-diffusion-cpp-python - no GPU backend available${NC}"
fi
# Install additional requirements
......@@ -667,8 +688,11 @@ elif [ "$BACKEND" = "all" ]; then
echo "Available backends:"
[ "$CUDA_AVAILABLE" = true ] && echo " ✓ NVIDIA/CUDA (PyTorch)"
[ "$CUDA_AVAILABLE" = true ] && echo " ✓ CUDA (llama-cpp-python)"
[ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" = true ] && echo " ✓ CUDA+Vulkan (stable-diffusion-cpp-python)"
[ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" != true ] && echo " ✓ CUDA (stable-diffusion-cpp-python)"
[ "$CUDA_AVAILABLE" != true ] && [ "$VULKAN_AVAILABLE" = true ] && echo " ✓ Vulkan (stable-diffusion-cpp-python)"
[ "$VULKAN_AVAILABLE" = true ] && echo " ✓ Vulkan (llama-cpp-python)"
[ "$OPENCL_AVAILABLE" = true ] && echo " ✓ OpenCL (stable-diffusion-cpp-python)"
[ "$OPENCL_AVAILABLE" = true ] && [ "$CUDA_AVAILABLE" != true ] && [ "$VULKAN_AVAILABLE" != true ] && echo " ✓ OpenCL (stable-diffusion-cpp-python)"
echo " ✓ CPU (fallback for all)"
if [ "$FLASH" = true ] && [ "$CUDA_AVAILABLE" = true ]; then
echo ""
......
......@@ -15,10 +15,13 @@
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""Authentication and session management for admin dashboard."""
import base64
import hashlib
import hmac
import json
import os
import secrets
import threading
import time
from pathlib import Path
from typing import Any, Dict, Optional
......@@ -43,35 +46,62 @@ def get_or_create_secret(config_dir: Path) -> bytes:
def hash_password(password: str) -> str:
"""Hash a password using SHA-256 with salt.
"""Hash a password using argon2 (preferred) or scrypt as fallback.
In production, use argon2 or bcrypt. This is a minimal implementation
for environments where those libraries aren't available.
New hashes are always produced with a proper key-derivation function and
a per-password random salt. The legacy SHA-256/static-salt format is
only retained for *verification* of pre-existing hashes.
"""
# Use SHA-256 with a pepper-like secret for basic hashing
# Real implementation should use argon2 from main.py
salt = b'static_salt_' # In production, use per-user random salt
return hashlib.sha256(salt + password.encode()).hexdigest()
try:
from argon2 import PasswordHasher
ph = PasswordHasher()
return ph.hash(password)
except ImportError:
pass
# scrypt fallback: encode as "scrypt:<b64salt>:<b64key>"
salt = os.urandom(16)
key = hashlib.scrypt(password.encode(), salt=salt, n=2**14, r=8, p=1)
return "scrypt:" + base64.b64encode(salt).decode() + ":" + base64.b64encode(key).decode()
def verify_password(password: str, password_hash: str) -> bool:
"""Verify a password against its hash."""
# Try argon2 first
"""Verify a password against its hash.
Supports argon2, scrypt (new format), and the legacy SHA-256/static-salt
format so that old stored hashes continue to work.
"""
# --- argon2 ---
try:
from argon2 import PasswordHasher
from argon2.exceptions import VerifyMismatchError
from argon2.exceptions import VerifyMismatchError, InvalidHashError
ph = PasswordHasher()
try:
return ph.verify(password_hash, password)
except VerifyMismatchError:
return False
except InvalidHashError:
pass # not an argon2 hash; fall through
except Exception:
pass
except ImportError:
pass
# Fallback to simple hash
return hash_password(password) == password_hash
# --- scrypt ---
if password_hash.startswith("scrypt:"):
try:
parts = password_hash.split(":")
if len(parts) == 3:
salt = base64.b64decode(parts[1])
stored_key = base64.b64decode(parts[2])
new_key = hashlib.scrypt(password.encode(), salt=salt, n=2**14, r=8, p=1)
return hmac.compare_digest(new_key, stored_key)
except Exception:
pass
return False
# --- legacy SHA-256 with static salt (read-only; never written for new passwords) ---
legacy = hashlib.sha256(b'static_salt_' + password.encode()).hexdigest()
return hmac.compare_digest(legacy, password_hash)
class SessionManager:
......@@ -81,7 +111,7 @@ class SessionManager:
self.config_dir = config_dir
self.secret = get_or_create_secret(config_dir)
self.session_timeout = timedelta(minutes=session_timeout_minutes)
self._lock = __import__('threading').Lock()
self._lock = threading.Lock()
def _load_auth_data(self) -> Dict[str, Any]:
"""Load auth.json data."""
......
......@@ -236,15 +236,51 @@ async def api_status(username: str = Depends(require_auth)):
# VRAM info
vram = None
is_cuda = False
try:
import torch
if torch.cuda.is_available():
is_cuda = True
free, total = torch.cuda.mem_get_info()
used = total - free
vram = {"used": round(used / 1e9, 2), "total": round(total / 1e9, 2)}
vram = {"used": round(used / 1e9, 2), "free": round(free / 1e9, 2), "total": round(total / 1e9, 2),
"gpu": torch.cuda.get_device_name(0)}
except Exception:
pass
# Non-CUDA: read from sysfs (AMD amdgpu / Intel i915 / Arc)
if not is_cuda:
import os, glob as _glob
for card in sorted(_glob.glob("/sys/class/drm/card[0-9]")):
dev = card + "/device"
vram_total_path = dev + "/mem_info_vram_total"
if not os.path.exists(vram_total_path):
continue
try:
total_b = int(open(vram_total_path).read())
used_b = int(open(dev + "/mem_info_vram_used").read())
free_b = total_b - used_b
# GPU name from lspci
gpu_name = ""
try:
pci_addr = os.path.basename(os.path.realpath(dev))
import subprocess
r = subprocess.run(["lspci", "-s", pci_addr], capture_output=True, text=True, timeout=3)
if r.returncode == 0 and r.stdout:
# "05:00.0 VGA compatible controller: AMD Radeon RX 580"
gpu_name = r.stdout.split(":", 2)[-1].strip().rstrip()
except Exception:
pass
vram = {
"gpu": gpu_name,
"used": round(used_b / 1e9, 2),
"free": round(free_b / 1e9, 2),
"total": round(total_b / 1e9, 2),
}
break
except Exception:
continue
# Request stats from queue manager
req_total = 0
req_active = 0
......@@ -285,6 +321,17 @@ async def api_status(username: str = Depends(require_auth)):
except Exception:
pass
# Whisper-server status
whisper_status = None
try:
from codai.models.manager import multi_model_manager as _mmm
if _mmm.whisper_servers:
whisper_status = {mid: wsm.get_status() for mid, wsm in _mmm.whisper_servers.items()}
elif _mmm.whisper_server:
whisper_status = {"whisper-server": _mmm.whisper_server.get_status()}
except Exception:
pass
return {
"status": "ok",
"backend": backend,
......@@ -293,8 +340,10 @@ async def api_status(username: str = Depends(require_auth)):
"loaded_models": loaded_keys,
"enabled_models": enabled_models,
"vram": vram,
"cuda": is_cuda,
"requests": {"total": req_total, "active": req_active},
"recent_activity": recent_activity,
"whisper_server": whisper_status,
}
......@@ -1195,16 +1244,23 @@ async def api_model_configure(request: Request, username: str = Depends(require_
raise HTTPException(status_code=503, detail="Config manager not initialized")
data = await request.json()
path = data.get("path") or data.get("model_id", "")
model_type = data.get("model_type", "text_models")
# Treat legacy gguf_models as text_models (GGUF is a format, not a type)
if model_type == "gguf_models":
model_type = "text_models"
valid = {"text_models", "image_models", "audio_models", "tts_models", "vision_models", "video_models",
"audio_gen_models", "embedding_models"}
if not path:
raise HTTPException(status_code=400, detail="path is required")
if model_type not in valid:
raise HTTPException(status_code=400, detail=f"model_type must be one of {valid}")
# Accept model_types (list) or fall back to single model_type
raw_types = data.get("model_types") or []
if not raw_types:
raw_types = [data.get("model_type", "text_models")]
# Normalize: gguf_models → text_models, deduplicate, filter valid
model_types = list(dict.fromkeys(
("text_models" if t == "gguf_models" else t)
for t in raw_types if t
))
model_types = [t for t in model_types if t in valid]
if not model_types:
model_types = ["text_models"]
# Remove from all categories (handles type changes)
for cat in valid | {"gguf_models"}:
......@@ -1220,14 +1276,16 @@ async def api_model_configure(request: Request, username: str = Depends(require_
import os
if os.path.isfile(path):
size_bytes = os.path.getsize(path)
# GGUF: ~1.1x file size; HF safetensors: ~1.2x
multiplier = 1.1 if path.endswith(".gguf") else 1.2
used_vram_gb = round(size_bytes / 1e9 * multiplier, 2)
# Build settings entry (drop None-valued optional keys to keep JSON tidy)
entry: dict = {"path": path, "model_type": model_type}
# Build settings entry
entry: dict = {"path": path, "model_type": model_types[0], "model_types": model_types}
if used_vram_gb is not None:
entry["used_vram_gb"] = used_vram_gb
# Store video sub-types (t2v / i2v / v2v) when present
if data.get("video_subtypes"):
entry["video_subtypes"] = data["video_subtypes"]
for key in ("alias", "backend", "load_mode", "n_gpu_layers", "n_ctx",
"max_gpu_percent", "manual_ram_gb", "load_in_4bit", "load_in_8bit",
"flash_attention", "no_ram", "offload_strategy", "offload_dir",
......@@ -1235,7 +1293,9 @@ async def api_model_configure(request: Request, username: str = Depends(require_
if key in data:
entry[key] = data[key]
config_manager.models_data.setdefault(model_type, []).append(entry)
# Add entry to each selected category
for mtype in model_types:
config_manager.models_data.setdefault(mtype, []).append(entry)
config_manager.save_models()
return {"success": True}
......@@ -1286,6 +1346,7 @@ async def api_get_settings(username: str = Depends(require_admin)):
"https": c.server.https,
"https_key_path": c.server.https_key_path,
"https_cert_path": c.server.https_cert_path,
"queue_max_size": c.server.queue_max_size,
},
"backend": {
"type": c.backend.type,
......@@ -1341,6 +1402,10 @@ async def api_save_settings(request: Request, username: str = Depends(require_ad
c.server.https = bool(srv.get("https", c.server.https))
c.server.https_key_path = srv.get("https_key_path") or None
c.server.https_cert_path = srv.get("https_cert_path") or None
if "queue_max_size" in srv:
c.server.queue_max_size = max(1, int(srv["queue_max_size"]))
from codai.queue.manager import queue_manager
queue_manager.max_size = c.server.queue_max_size
if "backend" in data:
bk = data["backend"]
......@@ -1395,6 +1460,81 @@ async def api_save_settings(request: Request, username: str = Depends(require_ad
return {"success": True}
# --- Whisper-server management ---
@router.get("/admin/api/whisper-server/status")
async def api_whisper_server_status(username: str = Depends(require_admin)):
"""Return status of all registered whisper-server instances."""
from codai.models.manager import multi_model_manager
if multi_model_manager.whisper_servers:
return {
mid: wsm.get_status()
for mid, wsm in multi_model_manager.whisper_servers.items()
}
# Legacy single-instance fallback
if multi_model_manager.whisper_server:
return {"whisper-server": multi_model_manager.whisper_server.get_status()}
return {}
@router.post("/admin/api/whisper-server/start")
async def api_whisper_server_start(request: Request, username: str = Depends(require_admin)):
"""Start (or restart) a whisper-server instance by model_id."""
from codai.models.manager import multi_model_manager
data = await request.json()
model_id = data.get("model_id", "whisper-server")
server_path = data.get("server_path", "")
model_path = data.get("model_path") or None
port = int(data.get("port", 8744))
gpu_device = int(data.get("gpu_device", 0))
if not server_path:
raise HTTPException(status_code=400, detail="server_path required")
wsm = multi_model_manager.whisper_servers.get(model_id)
if wsm is None:
wsm = multi_model_manager.register_whisper_server(
model_id=model_id, server_path=server_path,
model_path=model_path, port=port, gpu_device=gpu_device,
)
else:
wsm.server_path = server_path
wsm.port = port
wsm.base_url = f"http://127.0.0.1:{port}"
wsm._model_path = model_path
wsm._gpu_device = gpu_device
result = wsm.start(model_path, gpu_device=gpu_device)
running = wsm.is_running()
if running:
ws_key = f"audio:{model_id}"
multi_model_manager.models[ws_key] = wsm
multi_model_manager.active_in_vram = ws_key
multi_model_manager.models_in_vram.add(ws_key)
return {"success": running, "running": running, "started_model": result}
@router.post("/admin/api/whisper-server/stop")
async def api_whisper_server_stop(request: Request, username: str = Depends(require_admin)):
"""Stop a whisper-server instance by model_id."""
from codai.models.manager import multi_model_manager
data = await request.json() if request.headers.get("content-type", "").startswith("application/json") else {}
model_id = data.get("model_id", "whisper-server")
wsm = multi_model_manager.whisper_servers.get(model_id) or multi_model_manager.whisper_server
if wsm:
wsm.stop()
ws_key = f"audio:{model_id}"
multi_model_manager.models.pop(ws_key, None)
multi_model_manager.models_in_vram.discard(ws_key)
if multi_model_manager.active_in_vram == ws_key:
multi_model_manager.active_in_vram = None
return {"success": True, "running": False}
# --- HuggingFace model search proxy ---
import re as _re
......
......@@ -8,8 +8,8 @@
--border: #1A1D28;
--border-2: #252836;
--text: #DDE1F0;
--text-2: #636880;
--text-3: #2E3145;
--text-2: #8B90A8;
--text-3: #555A72;
--accent: #6366F1;
--accent-s: rgba(99,102,241,.12);
--green: #34D399;
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -28,15 +28,17 @@
<div class="stat-value" id="req-total">0</div>
<div class="stat-sub"><span id="req-active">0</span> active</div>
</div>
<div class="stat">
<div class="stat" id="vram-card" style="display:none">
<div class="stat-label">VRAM</div>
<div class="stat-value" id="vram-pct"></div>
<div class="stat-value" id="vram-pct" style="font-size:2rem"></div>
<div class="progress" style="margin-top:.625rem">
<div class="progress-fill" id="vram-bar" style="width:0%"></div>
</div>
<div class="progress-labels">
<span id="vram-used"></span><span id="vram-total"></span>
<div class="progress-labels" style="color:var(--text-1);font-size:12px;margin-top:.4rem">
<span id="vram-used"></span><span id="vram-free"></span>
</div>
<div style="font-size:11.5px;color:var(--text-2);margin-top:.2rem;font-family:var(--mono)" id="vram-total-line"></div>
<div class="stat-sub" id="vram-gpu" style="margin-top:.25rem"></div>
</div>
</div>
......@@ -85,13 +87,25 @@ async function poll() {
document.getElementById('active-models').innerHTML = html || '<span class="muted small">No models loaded</span>';
if (d.vram) {
const pct = Math.round(d.vram.used / d.vram.total * 100);
document.getElementById('vram-pct').textContent = pct + '%';
document.getElementById('vram-bar').style.width = pct + '%';
document.getElementById('vram-used').textContent = d.vram.used.toFixed(1) + ' GB';
document.getElementById('vram-total').textContent = d.vram.total.toFixed(1) + ' GB';
document.getElementById('vram-card').style.display = '';
if (d.vram.free != null && d.vram.total) {
const usedPct = Math.round(d.vram.used / d.vram.total * 100);
document.getElementById('vram-pct').textContent = usedPct + '%';
document.getElementById('vram-bar').style.width = usedPct + '%';
document.getElementById('vram-used').textContent = d.vram.used.toFixed(1) + ' GB used';
document.getElementById('vram-free').textContent = d.vram.free.toFixed(1) + ' GB free';
document.getElementById('vram-total-line').textContent = d.vram.total.toFixed(1) + ' GB total';
} else {
document.getElementById('vram-pct').textContent = d.vram.total ? d.vram.total.toFixed(1) + ' GB' : '—';
document.getElementById('vram-bar').style.width = '0%';
document.getElementById('vram-used').textContent = '';
document.getElementById('vram-free').textContent = '';
document.getElementById('vram-total-line').textContent = '';
}
const gpuName = d.vram.gpu || '';
document.getElementById('vram-gpu').textContent = gpuName.length > 32 ? gpuName.slice(0, 32) + '…' : gpuName;
} else {
document.getElementById('vram-pct').textContent = 'N/A';
document.getElementById('vram-card').style.display = 'none';
}
if (d.requests) {
......
......@@ -94,6 +94,20 @@
<div class="card-title">GGUF files <span id="gguf-file-badge" class="muted small"></span></div>
<div id="gguf-models-list"><span class="muted small">Loading…</span></div>
</div>
<!-- Whisper Server -->
<div class="card mb-0" style="margin-top:1rem" id="ws-card">
<div style="display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:.5rem">
<div>
<div class="card-title" style="margin:0">whisper-server <span class="muted" style="font-size:11px;font-weight:400">— native subprocess (AMD/Vulkan)</span></div>
<div id="ws-model-status" class="muted small" style="margin-top:.25rem"></div>
</div>
<div style="display:flex;align-items:center;gap:.5rem">
<span id="ws-running-badge" style="font-size:12px;font-weight:500"></span>
<a href="/admin/settings" class="btn btn-sm btn-ghost">Configure</a>
</div>
</div>
</div>
</div>
<!-- SEARCH -->
......@@ -315,25 +329,27 @@
<label class="form-label">Model ID / path</label>
<div id="cfg-id-label" style="font-size:12px;font-family:monospace;color:var(--text-2);word-break:break-all;padding:.3rem 0"></div>
</div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:.75rem">
<div class="form-row" style="margin:0">
<label class="form-label">Type</label>
<select id="cfg-type" class="form-input">
<option value="text_models">Text (LLM)</option>
<option value="image_models">Image generation</option>
<option value="video_models">Video generation</option>
<option value="audio_models">Audio transcription (STT)</option>
<option value="tts_models">Text-to-speech (TTS)</option>
<option value="vision_models">Vision / VLM</option>
<option value="audio_gen_models">Audio generation (Music/SFX)</option>
<option value="embedding_models">Embeddings</option>
</select>
<div class="form-row">
<label class="form-label" style="display:flex;align-items:center;gap:.5rem">Type
<span id="cfg-type-autodet" style="font-size:11px;color:var(--text-3);font-weight:400"></span>
</label>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:.3rem .75rem;margin-top:.35rem">
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="text_models"> Text / LLM</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="vision_models"> Vision / VLM</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="image_models"> Image gen (T2I / I2I)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="t2v" value="video_models"> Video gen (T2V)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="i2v" value="video_models"> Image-to-Video (I2V)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="v2v" value="video_models"> Video-to-Video (V2V)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="audio_models"> Audio transcription (STT)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="tts_models"> Text-to-Speech (TTS)</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="audio_gen_models"> Audio generation</label>
<label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="embedding_models"> Embeddings</label>
</div>
<div class="form-row" style="margin:0">
</div>
<div class="form-row">
<label class="form-label">Alias <span class="muted">(optional)</span></label>
<input type="text" id="cfg-alias" class="form-input" placeholder="Friendly name">
</div>
</div>
<!-- backend -->
<div class="card-title" style="margin-top:1.25rem">Backend</div>
......@@ -501,6 +517,33 @@ async function loadGlobalSettings(){
}catch{}
}
async function loadWsStatus(){
try{
const s = await fetch('/admin/api/whisper-server/status').then(r=>r.json());
const card = document.getElementById('ws-card');
const badge = document.getElementById('ws-running-badge');
const modelEl = document.getElementById('ws-model-status');
const entries = Object.entries(s);
if(!entries.length){
card.style.display = 'none';
return;
}
card.style.display = '';
const running = entries.filter(([,v])=>v.running);
if(running.length){
badge.textContent = `● ${running.length}/${entries.length} running`;
badge.style.color = 'var(--green, #4ade80)';
card.style.borderColor = 'rgba(74,222,128,.3)';
modelEl.textContent = running.map(([id,v])=>`${id}: ${v.model||'?'} @ ${v.url}`).join(' | ');
} else {
badge.textContent = '○ stopped';
badge.style.color = 'var(--text-2)';
card.style.borderColor = '';
modelEl.textContent = entries.map(([id])=>id).join(', ') + ' — not started';
}
}catch{}
}
/* ── GGUF format toggle ──────────────────────────────── */
let _ggufMode = 'gguf';
document.querySelectorAll('.tog-btn').forEach(btn=>{
......@@ -958,7 +1001,8 @@ async function loadCachedModels(){
const rows = hf.map(m=>{
const idx = _localModels.length;
_localModels.push({label:m.id, path:m.id, cacheType:'hf', size_gb:m.size_gb||0,
defaultType:m.model_type||'text_models', settings:m.settings||{}, in_config:m.in_config});
defaultType:m.model_type||'text_models', settings:m.settings||{}, in_config:m.in_config,
capabilities:m.capabilities||[]});
const loaded = _loadedKeys.has(m.id) || [..._loadedKeys].some(k=>k.endsWith(':'+m.id)||k===m.id);
const capBadges = fmtCapabilities(m.capabilities||[]);
return `<tr style="border-top:1px solid var(--border)">
......@@ -996,7 +1040,8 @@ async function loadCachedModels(){
const rows = gguf.map(f=>{
const idx = _localModels.length;
_localModels.push({label:f.filename, path:f.path, cacheType:'gguf', size_gb:f.size_gb||0,
defaultType:f.model_type||'text_models', settings:f.settings||{}, in_config:f.in_config});
defaultType:f.model_type||'text_models', settings:f.settings||{}, in_config:f.in_config,
capabilities:f.capabilities||[]});
const loaded = _loadedKeys.has(f.path) || _loadedKeys.has(f.filename) || [..._loadedKeys].some(k=>k.endsWith(':'+f.path)||k.endsWith(':'+f.filename));
const capBadges = fmtCapabilities(f.capabilities||[]);
return `<tr style="border-top:1px solid var(--border)">
......@@ -1044,6 +1089,8 @@ async function refreshLocal(){
loadGlobalSettings();
refreshLocal();
loadWsStatus();
setInterval(loadWsStatus, 5000);
async function clearCacheConfirm(type){
const labels = {hf:'HuggingFace', gguf:'GGUF', all:'ALL'};
......@@ -1070,6 +1117,51 @@ async function deleteModelConfirm(idx){
}catch(e){alert('Error: '+e.message)}
}
/* ── type checkbox helpers ─────────────────────────────── */
function _capabilitiesToTypes(caps) {
const categories = new Set(), subs = new Set();
if (!caps || !caps.length) return {categories, subs};
if (caps.includes('image_to_video')) { categories.add('video_models'); subs.add('i2v'); }
if (caps.includes('video_generation')){ categories.add('video_models'); subs.add('t2v'); }
if (caps.includes('video_to_video')) { categories.add('video_models'); subs.add('v2v'); }
if (caps.includes('image_generation') || caps.includes('image_to_image') ||
caps.includes('inpainting') || caps.includes('controlnet')) categories.add('image_models');
if (caps.includes('image_to_text') && caps.includes('text_generation')) {
categories.add('vision_models');
} else if (caps.includes('text_generation') &&
!categories.has('video_models') && !categories.has('image_models')) {
categories.add('text_models');
}
if (caps.includes('speech_to_text')) categories.add('audio_models');
if (caps.includes('text_to_speech')) categories.add('tts_models');
if (caps.includes('audio_generation')) categories.add('audio_gen_models');
if (caps.includes('embeddings')) categories.add('embedding_models');
return {categories, subs};
}
function _setTypeCheckboxes(categories, subs) {
document.querySelectorAll('.cfg-type-cb').forEach(cb => {
const sub = cb.dataset.sub;
if (!categories.has(cb.value)) { cb.checked = false; return; }
if (sub) {
// Sub-typed checkbox (T2V / I2V / V2V): check only if this sub is in subs
cb.checked = subs.has(sub);
} else if (cb.value === 'video_models') {
// Non-sub video checkbox: only relevant when subs is empty (legacy/no-sub)
cb.checked = subs.size === 0;
} else {
cb.checked = true;
}
});
}
function _getCheckedTypes() {
const checked = [...document.querySelectorAll('.cfg-type-cb:checked')];
const categories = [...new Set(checked.map(cb => cb.value))];
const subs = checked.filter(cb => cb.dataset.sub).map(cb => cb.dataset.sub);
return {primaryType: categories[0] || 'text_models', model_types: categories, video_subtypes: subs};
}
function openCfgModal(idx){
const m = _localModels[idx];
const s = m.settings || {};
......@@ -1077,9 +1169,46 @@ function openCfgModal(idx){
document.getElementById('cfg-id-label').textContent = m.label;
document.getElementById('cfg-path').value = m.path;
document.getElementById('cfg-orig-type').value = m.defaultType;
// Map legacy gguf_models to text_models
const rawType = s.model_type || m.defaultType;
document.getElementById('cfg-type').value = rawType === 'gguf_models' ? 'text_models' : rawType;
// Determine type checkboxes: saved config > auto-detect from capabilities > defaultType
const det = document.getElementById('cfg-type-autodet');
if (s.model_types && s.model_types.length) {
// Previously saved multi-type config
const savedSubs = new Set(s.video_subtypes || []);
_setTypeCheckboxes(new Set(s.model_types), savedSubs);
det.textContent = '';
} else if (s.model_type) {
// Single saved type
const normType = s.model_type === 'gguf_models' ? 'text_models' : s.model_type;
let savedSubs = new Set(s.video_subtypes || []);
// Legacy video_models with no sub-types: infer from capabilities or default T2V
if (normType === 'video_models' && savedSubs.size === 0) {
const caps = m.capabilities || [];
if (caps.includes('image_to_video')) savedSubs.add('i2v');
if (caps.includes('video_to_video')) savedSubs.add('v2v');
if (caps.includes('video_generation') || savedSubs.size === 0) savedSubs.add('t2v');
}
_setTypeCheckboxes(new Set([normType]), savedSubs);
det.textContent = '';
} else {
// Auto-detect from capabilities
const caps = m.capabilities || [];
const {categories, subs} = _capabilitiesToTypes(caps);
if (categories.size > 0) {
_setTypeCheckboxes(categories, subs);
det.textContent = '(auto-detected)';
} else {
const rawType = m.defaultType === 'gguf_models' ? 'text_models' : (m.defaultType || 'text_models');
let fallbackSubs = new Set();
if (rawType === 'video_models') {
// No capabilities matched; default to T2V for unclassified video models
fallbackSubs.add('t2v');
}
_setTypeCheckboxes(new Set([rawType]), fallbackSubs);
det.textContent = m.cacheType === 'gguf' ? '(auto-detected: GGUF text model)' : '';
}
}
document.getElementById('cfg-alias').value = s.alias || '';
document.getElementById('cfg-backend').value = s.backend || 'auto';
document.getElementById('cfg-load-mode').value = s.load_mode || 'on-request';
......@@ -1117,9 +1246,12 @@ async function saveModelConfig(){
const maxGpu = parseFloat(document.getElementById('cfg-max-gpu').value);
const ramGb = parseFloat(document.getElementById('cfg-ram-gb').value);
const usedVram = parseFloat(document.getElementById('cfg-used-vram').value);
const {primaryType, model_types, video_subtypes} = _getCheckedTypes();
const data = {
path,
model_type: document.getElementById('cfg-type').value,
model_type: primaryType,
model_types: model_types,
video_subtypes: video_subtypes.length ? video_subtypes : undefined,
alias: document.getElementById('cfg-alias').value.trim() || null,
backend: document.getElementById('cfg-backend').value,
load_mode: document.getElementById('cfg-load-mode').value,
......
......@@ -45,6 +45,11 @@
<input type="text" id="s-cert" class="form-input" placeholder="/path/to/cert.pem">
</div>
</div>
<div class="form-row" style="margin-top:1rem;margin-bottom:0">
<label class="form-label">Request queue max size</label>
<input type="number" id="s-queue-max" class="form-input" placeholder="6" min="1" max="1000" style="max-width:160px">
<span class="form-hint">Maximum number of concurrent queued requests. Authenticated requests arriving when the queue is full receive a 429 response.</span>
</div>
</div>
<!-- Storage -->
......@@ -64,6 +69,48 @@
<span class="form-hint">Models will inherit this as default when configured</span>
</div>
</div>
<!-- Whisper Server -->
<div class="card mb-0" style="margin-top:1rem">
<div style="display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:.5rem;margin-bottom:1rem">
<div class="card-title" style="margin:0">Whisper Server <span class="muted" style="font-size:11px;font-weight:400">(whisper.cpp native binary — recommended for AMD/Vulkan)</span></div>
<div style="display:flex;align-items:center;gap:.5rem">
<span id="ws-badge" class="muted small"></span>
<button class="btn btn-sm btn-secondary" onclick="wsStart()">Start</button>
<button class="btn btn-sm btn-danger" onclick="wsStop()">Stop</button>
</div>
</div>
<div style="display:grid;grid-template-columns:1fr 160px;gap:1rem;align-items:start">
<div class="form-row" style="margin:0">
<label class="form-label">Model ID <span class="muted">(used in API calls, e.g. whisper-base)</span></label>
<input type="text" id="ws-id" class="form-input" placeholder="whisper-server">
<span class="form-hint">The name clients use in the <code>model</code> field of transcription requests</span>
</div>
<div class="form-row" style="margin:0">
<label class="form-label">Port</label>
<input type="number" id="ws-port" class="form-input" placeholder="8744" min="1024" max="65535">
</div>
</div>
<div style="display:grid;grid-template-columns:1fr 160px;gap:1rem;align-items:start;margin-top:1rem">
<div class="form-row" style="margin:0">
<label class="form-label">whisper-server binary path</label>
<input type="text" id="ws-path" class="form-input" placeholder="/usr/local/bin/whisper-server">
</div>
<div class="form-row" style="margin:0">
<label class="form-label">GPU device index</label>
<input type="number" id="ws-gpu" class="form-input" placeholder="0" min="0">
</div>
</div>
<div class="form-row" style="margin-top:1rem;margin-bottom:0">
<label class="form-label">Model path <span class="muted">(GGUF whisper model, e.g. ggml-base.bin)</span></label>
<input type="text" id="ws-model" class="form-input" placeholder="/path/to/ggml-base.bin">
<span class="form-hint">Configure multiple instances by adding entries to <code>models.json</code> with <code>"backend": "whisper-server"</code></span>
</div>
<p class="form-hint" style="margin-top:.75rem;margin-bottom:0">
When configured, the transcription endpoint uses this subprocess instead of the Python faster-whisper module.
Saves settings to <code>config.json</code> and takes effect immediately (no restart needed).
</p>
</div>
{% endblock %}
{% block scripts %}
......@@ -89,13 +136,69 @@ async function loadSettings(){
document.getElementById('s-https').checked = !!d.server?.https;
document.getElementById('s-key').value = d.server?.https_key_path ?? '';
document.getElementById('s-cert').value = d.server?.https_cert_path ?? '';
document.getElementById('s-queue-max').value = d.server?.queue_max_size ?? 6;
document.getElementById('s-hf-cache').value = d.models?.hf_cache_dir ?? '';
document.getElementById('s-gguf-cache').value = d.models?.gguf_cache_dir ?? '';
document.getElementById('s-offload-dir').value = d.offload?.directory ?? './offload';
document.getElementById('ws-path').value = d.whisper?.server_path ?? '';
document.getElementById('ws-port').value = d.whisper?.server_port ?? 8744;
toggleHttps();
}catch(e){ showAlert('error','Failed to load settings: '+e.message); }
}
async function loadWsStatus(){
try{
const s = await fetch('/admin/api/whisper-server/status').then(r=>r.json());
const badge = document.getElementById('ws-badge');
// s is now a dict of {model_id: {running, model, url}}
const entries = Object.entries(s);
if(!entries.length){
badge.textContent = '○ not configured';
badge.style.color = 'var(--text-2)';
return;
}
const running = entries.filter(([,v])=>v.running);
if(running.length){
badge.textContent = `● ${running.length} running`;
badge.style.color = 'var(--green, #4ade80)';
} else {
badge.textContent = '○ stopped';
badge.style.color = 'var(--text-2)';
}
}catch(e){}
}
async function wsStart(){
const path = document.getElementById('ws-path').value.trim();
if(!path){ showAlert('error','Binary path required'); return; }
try{
const r = await fetch('/admin/api/whisper-server/start',{
method:'POST', headers:{'Content-Type':'application/json'},
body: JSON.stringify({
model_id: document.getElementById('ws-id').value.trim() || 'whisper-server',
server_path: path,
model_path: document.getElementById('ws-model').value.trim() || null,
port: parseInt(document.getElementById('ws-port').value) || 8744,
gpu_device: parseInt(document.getElementById('ws-gpu').value) || 0,
})
});
const d = await r.json();
if(d.success) showAlert('info','whisper-server started');
else showAlert('error','Failed to start whisper-server');
loadWsStatus();
}catch(e){ showAlert('error','Error: '+e.message); }
}
async function wsStop(){
const modelId = document.getElementById('ws-id').value.trim() || 'whisper-server';
await fetch('/admin/api/whisper-server/stop',{
method:'POST', headers:{'Content-Type':'application/json'},
body: JSON.stringify({model_id: modelId})
});
showAlert('info','whisper-server stopped');
loadWsStatus();
}
async function saveSettings(){
const strOrNull = id => document.getElementById(id).value.trim() || null;
const data = {
......@@ -105,6 +208,7 @@ async function saveSettings(){
https: document.getElementById('s-https').checked,
https_key_path: strOrNull('s-key'),
https_cert_path: strOrNull('s-cert'),
queue_max_size: parseInt(document.getElementById('s-queue-max').value) || 6,
},
models:{
hf_cache_dir: strOrNull('s-hf-cache'),
......@@ -112,7 +216,11 @@ async function saveSettings(){
},
offload:{
directory: document.getElementById('s-offload-dir').value.trim() || './offload',
}
},
whisper:{
server_path: document.getElementById('ws-path').value.trim() || null,
server_port: parseInt(document.getElementById('ws-port').value) || 8744,
},
};
try{
const r = await fetch('/admin/api/settings',{
......@@ -125,5 +233,7 @@ async function saveSettings(){
}
loadSettings();
loadWsStatus();
setInterval(loadWsStatus, 5000);
</script>
{% endblock %}
......@@ -19,12 +19,16 @@ FastAPI application module for codai API.
Contains the FastAPI app initialization, lifespan, and core endpoints.
"""
import logging
import os
from contextlib import asynccontextmanager
from typing import List
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import FileResponse, JSONResponse
logger = logging.getLogger(__name__)
# Import from codai modules
from codai.pydantic.textrequest import ModelList
from codai.models.manager import model_manager, multi_model_manager
......@@ -89,11 +93,19 @@ from codai.api.text import router as text_router
from codai.api.video import router as video_router
from codai.api.audio_gen import router as audio_gen_router
from codai.api.embeddings import router as embeddings_router
from codai.api.pipelines import router as pipelines_router
from codai.api.custom_pipelines import router as custom_pipelines_router
from codai.api.voice_clone import router as voice_clone_router
from codai.api.voice_convert import router as voice_convert_router
from codai.api.faceswap import router as faceswap_router
from codai.api.characters import router as characters_router
from codai.admin.routes import router as admin_router
# Import and add middleware
from codai.api.log import log_requests
from codai.api.ratelimit import RateLimitMiddleware
app.middleware("http")(log_requests)
app.add_middleware(RateLimitMiddleware)
# Mount static files for admin dashboard
from fastapi.staticfiles import StaticFiles
......@@ -110,6 +122,12 @@ app.include_router(text_router)
app.include_router(video_router)
app.include_router(audio_gen_router)
app.include_router(embeddings_router)
app.include_router(pipelines_router)
app.include_router(custom_pipelines_router)
app.include_router(voice_clone_router)
app.include_router(voice_convert_router)
app.include_router(faceswap_router)
app.include_router(characters_router)
app.include_router(admin_router)
......@@ -133,11 +151,14 @@ async def list_models():
@app.get("/v1/files/{filename}")
async def get_file(filename: str):
"""Serve uploaded/generated files."""
print(f"DEBUG get_file: filename={filename}, global_file_path={global_file_path}")
if global_file_path:
import os
file_path = os.path.join(global_file_path, filename)
print(f"DEBUG get_file: full path={file_path}, exists={os.path.exists(file_path)}")
if os.path.exists(file_path):
return FileResponse(file_path)
if not global_file_path:
raise HTTPException(status_code=404, detail="File not found")
# Prevent path traversal: resolve to real paths and confirm the result
# stays inside the configured directory.
safe_base = os.path.realpath(global_file_path)
candidate = os.path.realpath(os.path.join(global_file_path, filename))
if not (candidate == safe_base or candidate.startswith(safe_base + os.sep)):
raise HTTPException(status_code=403, detail="Access denied")
if not os.path.isfile(candidate):
raise HTTPException(status_code=404, detail="File not found")
return FileResponse(candidate)
\ No newline at end of file
# CoderAI - OpenAI-compatible API server
# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Character profile endpoints.
Saved character profiles are named collections of reference images used to
maintain visual consistency of a character across multiple video generations.
POST /v1/characters – save / update a character profile
GET /v1/characters – list all saved profiles (no images)
GET /v1/characters/{name} – get a profile including base64 images
DELETE /v1/characters/{name} – delete a profile
"""
import base64
import json
import os
import time
from typing import List, Optional
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, ConfigDict
router = APIRouter()
_CHARS_DIR: Optional[str] = None
def set_global_args(args):
global _CHARS_DIR
base = getattr(args, 'file_path', None) or os.path.expanduser('~/.coderai')
root = base if os.path.isdir(base) else (os.path.dirname(base) if base else os.path.expanduser('~/.coderai'))
_CHARS_DIR = os.path.join(root, 'characters')
os.makedirs(_CHARS_DIR, exist_ok=True)
def set_global_file_path(path: str):
pass # not needed for characters
def _chars_dir() -> str:
if _CHARS_DIR:
return _CHARS_DIR
d = os.path.expanduser('~/.coderai/characters')
os.makedirs(d, exist_ok=True)
return d
def _char_dir(name: str) -> str:
return os.path.join(_chars_dir(), name)
# ── Pydantic models ───────────────────────────────────────────────────────────
class CharacterImage(BaseModel):
label: Optional[str] = None # e.g. "front", "side", "close-up"
data: str # base64 image (with or without data: prefix)
model_config = ConfigDict(extra="allow")
class CharacterSaveRequest(BaseModel):
name: str
description: Optional[str] = ""
images: List[CharacterImage] # one or more reference images
model_config = ConfigDict(extra="allow")
class CharacterProfile(BaseModel):
name: str
description: Optional[str] = ""
image_count: int
created_at: int
images: Optional[List[CharacterImage]] = None # only populated on GET /{name}
model_config = ConfigDict(extra="allow")
# ── Helpers ───────────────────────────────────────────────────────────────────
def _save_character(name: str, description: str, images: List[CharacterImage]) -> dict:
cdir = _char_dir(name)
os.makedirs(cdir, exist_ok=True)
img_files = []
for i, img in enumerate(images):
raw = img.data
if raw.startswith('data:'):
_, b64 = raw.split(',', 1)
else:
b64 = raw
img_bytes = base64.b64decode(b64)
# Detect PNG vs JPEG from magic bytes
ext = '.png' if img_bytes[:4] == b'\x89PNG' else '.jpg'
fname = f"ref{i:02d}{ext}"
fpath = os.path.join(cdir, fname)
with open(fpath, 'wb') as f:
f.write(img_bytes)
img_files.append({'file': fname, 'label': img.label or f'ref{i}'})
meta = {
'name': name,
'description': description,
'images': img_files,
'image_count': len(img_files),
'created_at': int(time.time()),
}
with open(os.path.join(cdir, 'meta.json'), 'w') as f:
json.dump(meta, f)
return meta
def _load_character_meta(name: str) -> Optional[dict]:
meta_path = os.path.join(_char_dir(name), 'meta.json')
if not os.path.exists(meta_path):
return None
with open(meta_path) as f:
return json.load(f)
def _load_character_images(name: str) -> List[CharacterImage]:
meta = _load_character_meta(name)
if not meta:
return []
cdir = _char_dir(name)
result = []
for img_info in meta.get('images', []):
fpath = os.path.join(cdir, img_info['file'])
if not os.path.exists(fpath):
continue
with open(fpath, 'rb') as f:
raw = f.read()
ext = img_info['file'].rsplit('.', 1)[-1]
mime = 'image/png' if ext == 'png' else 'image/jpeg'
b64 = base64.b64encode(raw).decode()
result.append(CharacterImage(
label=img_info.get('label'),
data=f"data:{mime};base64,{b64}",
))
return result
def _list_characters() -> list:
d = _chars_dir()
profiles = []
for entry in os.scandir(d):
if entry.is_dir():
meta = _load_character_meta(entry.name)
if meta:
profiles.append({k: v for k, v in meta.items() if k != 'images'})
return sorted(profiles, key=lambda p: p.get('created_at', 0))
def resolve_character_profiles(profile_names: List[str]) -> List[str]:
"""Resolve saved profile names → flat list of base64 image strings."""
out = []
for name in profile_names:
for img in _load_character_images(name):
out.append(img.data)
return out
# ── Endpoints ─────────────────────────────────────────────────────────────────
@router.post("/v1/characters")
async def save_character(req: CharacterSaveRequest):
"""Save or update a named character profile."""
if not req.name or '/' in req.name or '..' in req.name:
raise HTTPException(status_code=400, detail="Invalid character name")
if not req.images:
raise HTTPException(status_code=400, detail="At least one reference image required")
meta = _save_character(req.name, req.description or '', req.images)
return {"ok": True, "name": meta['name'], "image_count": meta['image_count']}
@router.get("/v1/characters")
async def list_characters():
"""List all saved character profiles (metadata only, no images)."""
return {"characters": _list_characters()}
@router.get("/v1/characters/{name}")
async def get_character(name: str):
"""Get a character profile including its reference images as base64."""
meta = _load_character_meta(name)
if not meta:
raise HTTPException(status_code=404, detail=f"Character '{name}' not found")
images = _load_character_images(name)
return {
"name": meta['name'],
"description": meta.get('description', ''),
"image_count": meta['image_count'],
"created_at": meta['created_at'],
"images": [img.model_dump() for img in images],
}
@router.delete("/v1/characters/{name}")
async def delete_character(name: str):
"""Delete a character profile."""
cdir = _char_dir(name)
if not os.path.isdir(cdir):
raise HTTPException(status_code=404, detail=f"Character '{name}' not found")
import shutil
shutil.rmtree(cdir)
return {"ok": True, "name": name}
"""
Custom pipeline executor.
GET /v1/pipelines/custom — list saved custom pipelines
POST /v1/pipelines/custom — save a new custom pipeline definition
PUT /v1/pipelines/custom/{id} — update a pipeline
DELETE /v1/pipelines/custom/{id} — delete a pipeline
POST /v1/pipelines/custom/{id}/run — execute a saved pipeline
POST /v1/pipelines/run — execute an inline pipeline definition (no save)
Pipeline definition schema:
{
"id": "my-pipeline", # auto-generated if absent
"name": "My Pipeline",
"steps": [
{
"type": "text_gen", # step type (see STEP_TYPES)
"label": "Write script", # optional display label
"params": { # static params merged with runtime context
"model": "Qwen/Qwen3.5-9B",
"prompt": "{{input}}", # {{input}} = pipeline input text
# {{stepN.output}} = output of step N
# {{stepN.url}} = URL output of step N
}
},
{
"type": "image_gen",
"params": {
"model": "sd-model",
"prompt": "{{step0.output}}"
}
}
]
}
Step types and their endpoint mapping:
text_gen → POST /v1/chat/completions
image_gen → POST /v1/images/generations
image_edit → POST /v1/images/edits
image_inpaint → POST /v1/images/inpaint
image_upscale → POST /v1/images/upscale
image_deblur → POST /v1/images/deblur
image_unpix → POST /v1/images/unpixelate
image_outfit → POST /v1/images/outfit
image_faceswap→ POST /v1/images/faceswap
video_gen → POST /v1/video/generations
video_upscale → POST /v1/video/upscale
video_sub → POST /v1/video/subtitle
video_interp → POST /v1/video/interpolate
video_dub → POST /v1/video/dub
tts → POST /v1/audio/speech
stt → POST /v1/audio/transcriptions (multipart)
audio_gen → POST /v1/audio/generate
voice_clone → POST /v1/audio/clone
voice_convert → POST /v1/audio/convert
"""
import asyncio
import time
import uuid
from typing import Any, Dict, List, Optional
from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel, ConfigDict
router = APIRouter()
# ---------------------------------------------------------------------------
# Step type → (handler_module, handler_fn, request_class)
# ---------------------------------------------------------------------------
STEP_TYPES = {
"text_gen": ("codai.api.text", "chat_completions", "codai.pydantic.textrequest.ChatCompletionRequest"),
"image_gen": ("codai.api.images", "create_image_generation","codai.pydantic.imagerequest.ImageGenerationRequest"),
"image_edit": ("codai.api.images", "create_image_edit", None),
"image_inpaint": ("codai.api.images", "create_image_inpaint", None),
"image_upscale": ("codai.api.images", "create_image_upscale", None),
"image_deblur": ("codai.api.images", "create_image_deblur", None),
"image_unpix": ("codai.api.images", "create_image_unpixelate",None),
"image_outfit": ("codai.api.images", "create_image_outfit", None),
"image_faceswap": ("codai.api.faceswap", "faceswap", None),
"video_gen": ("codai.api.video", "video_generations", "codai.pydantic.videorequest.VideoGenerationRequest"),
"video_upscale": ("codai.api.video", "video_upscale", None),
"video_sub": ("codai.api.video", "video_subtitle", None),
"video_interp": ("codai.api.video", "video_interpolate", None),
"video_dub": ("codai.api.video", "video_dub", None),
"tts": ("codai.api.tts", "create_speech", None),
"audio_gen": ("codai.api.audio_gen", "audio_generate", None),
"voice_clone": ("codai.api.voice_clone", "clone_voice", None),
"voice_convert": ("codai.api.voice_convert", "convert_voice", None),
}
# Human-readable labels for the UI
STEP_TYPE_LABELS = {
"text_gen": "Text Generation (LLM)",
"image_gen": "Image Generation",
"image_edit": "Image Edit (i2i)",
"image_inpaint": "Image Inpaint",
"image_upscale": "Image Upscale",
"image_deblur": "Image Deblur",
"image_unpix": "Image Unpixelate",
"image_outfit": "Outfit Change",
"image_faceswap": "Face Swap",
"video_gen": "Video Generation",
"video_upscale": "Video Upscale",
"video_sub": "Video Subtitles",
"video_interp": "Video Interpolate",
"video_dub": "Video Dub",
"tts": "Text-to-Speech",
"audio_gen": "Audio/Music Generation",
"voice_clone": "Voice Clone (TTS)",
"voice_convert": "Voice Convert (SVC)",
}
# Which params each step type accepts (for the UI form builder)
STEP_PARAMS = {
"text_gen": [("model","text","Model ID"),("prompt","textarea","Prompt"),("system","textarea","System prompt (opt)")],
"image_gen": [("model","text","Model ID"),("prompt","textarea","Prompt"),("negative_prompt","text","Negative prompt"),("size","text","Size","1024x1024"),("steps","number","Steps"),("guidance_scale","number","CFG","7.5"),("seed","number","Seed")],
"image_edit": [("model","text","Model ID"),("prompt","textarea","Prompt"),("image","ref","Source image ({{stepN.url}})"),("strength","number","Strength","0.75"),("steps","number","Steps"),("seed","number","Seed")],
"image_inpaint": [("model","text","Model ID"),("prompt","textarea","Prompt"),("image","ref","Source image"),("mask","ref","Mask image"),("strength","number","Strength","0.99"),("steps","number","Steps"),("seed","number","Seed")],
"image_upscale": [("model","text","Model ID (opt)"),("image","ref","Source image"),("scale","number","Scale","4")],
"image_deblur": [("image","ref","Source image"),("strength","number","Strength","0.5")],
"image_unpix": [("image","ref","Source image"),("scale","number","Scale","4")],
"image_outfit": [("model","text","Inpaint model ID"),("image","ref","Source image"),("prompt","textarea","Outfit prompt"),("negative_prompt","text","Negative prompt"),("steps","number","Steps"),("seed","number","Seed")],
"image_faceswap": [("source_face","ref","Source face image"),("target","ref","Target image/video"),("target_type","select:image|video","Target type","image")],
"video_gen": [("model","text","Model ID"),("prompt","textarea","Prompt"),("mode","select:t2v|i2v|v2v|ti2v","Mode","t2v"),("init_image","ref","Init image (i2v)"),("num_frames","number","Frames","16"),("fps","number","FPS","8"),("num_inference_steps","number","Steps","25"),("guidance_scale","number","CFG","7.5"),("seed","number","Seed")],
"video_upscale": [("model","text","Model ID"),("video","ref","Source video"),("upscale_factor","number","Scale","2")],
"video_sub": [("model","text","Model ID"),("video","ref","Source video"),("language","text","Language"),("burn","checkbox","Burn into video")],
"video_interp": [("model","text","Model ID"),("video","ref","Source video"),("fps_multiplier","number","FPS multiplier","2")],
"video_dub": [("model","text","Model ID"),("video","ref","Source video"),("target_lang","text","Target language"),("source_lang","text","Source language"),("burn_subtitles","checkbox","Burn subtitles")],
"tts": [("model","text","Model ID"),("input","textarea","Text ({{stepN.output}})"),("voice","text","Voice","af_sarah"),("speed","number","Speed","1.0")],
"audio_gen": [("model","text","Model ID"),("prompt","textarea","Prompt"),("duration","number","Duration (s)","10"),("temperature","number","Temperature","1.0")],
"voice_clone": [("text","textarea","Text to synthesize"),("voice_name","text","Voice profile name"),("ref_text","text","Reference transcript"),("speed","number","Speed","1.0")],
"voice_convert": [("source_audio","ref","Source audio"),("voice_name","text","Voice profile name"),("f0_condition","checkbox","Singing mode"),("pitch_shift","number","Pitch shift","0"),("diffusion_steps","number","Steps","10")],
}
def _resolve_template(value: Any, context: Dict) -> Any:
"""Replace {{input}}, {{stepN.output}}, {{stepN.url}} etc. in string values."""
if not isinstance(value, str):
return value
import re
def _replace(m):
key = m.group(1).strip()
# {{input}} → pipeline input
if key == 'input':
return str(context.get('input', ''))
# {{stepN.field}}
match = re.match(r'step(\d+)\.(\w+)', key)
if match:
n, field = int(match.group(1)), match.group(2)
step_result = context.get(f'step{n}', {})
return str(step_result.get(field, ''))
return m.group(0)
return re.sub(r'\{\{([^}]+)\}\}', _replace, value)
def _resolve_params(params: Dict, context: Dict) -> Dict:
return {k: _resolve_template(v, context) for k, v in params.items()}
def _extract_output(step_type: str, result: Any) -> Dict:
"""Extract useful fields from a step result for use in subsequent steps."""
if result is None:
return {}
r = result if isinstance(result, dict) else (result.__dict__ if hasattr(result, '__dict__') else {})
out = {}
# text_gen
if 'choices' in r:
out['output'] = r['choices'][0].get('message', {}).get('content', '') if r['choices'] else ''
# image/video/audio with data array
if 'data' in r and r['data']:
item = r['data'][0]
if isinstance(item, dict):
out['url'] = item.get('url', '')
for k, v in item.items():
out[k] = v
# tts audio field
if 'audio' in r:
out['audio'] = r['audio']
out['output'] = r['audio']
return out
async def _run_step(step: Dict, context: Dict, http_request) -> Dict:
"""Execute a single pipeline step and return its output context."""
step_type = step['type']
if step_type not in STEP_TYPES:
raise ValueError(f"Unknown step type: {step_type}")
mod_name, fn_name, req_class_path = STEP_TYPES[step_type]
params = _resolve_params(step.get('params', {}), context)
# Import handler
import importlib
mod = importlib.import_module(mod_name)
handler = getattr(mod, fn_name)
# Build request object
if req_class_path:
req_mod, req_cls = req_class_path.rsplit('.', 1)
req_class = getattr(importlib.import_module(req_mod), req_cls)
# text_gen needs messages format
if step_type == 'text_gen':
messages = [{"role": "user", "content": params.pop('prompt', '')}]
if 'system' in params and params['system']:
messages.insert(0, {"role": "system", "content": params.pop('system')})
else:
params.pop('system', None)
params['messages'] = messages
params.setdefault('stream', False)
req = req_class(**{k: v for k, v in params.items() if v != ''})
else:
# Find the request class from the handler's type hints
import inspect
sig = inspect.signature(handler)
first_param = list(sig.parameters.values())[0]
ann = first_param.annotation
if ann != inspect.Parameter.empty:
req = ann(**{k: v for k, v in params.items() if v != ''})
else:
req = type('Req', (), params)()
result = await handler(req, http_request)
return _extract_output(step_type, result)
async def _execute_pipeline(pipeline_def: Dict, pipeline_input: str, http_request) -> Dict:
"""Execute all steps of a pipeline definition."""
context = {'input': pipeline_input}
steps_output = []
for i, step in enumerate(pipeline_def.get('steps', [])):
try:
out = await _run_step(step, context, http_request)
context[f'step{i}'] = out
steps_output.append({'step': i, 'type': step['type'],
'label': step.get('label', step['type']), **out})
except Exception as e:
steps_output.append({'step': i, 'type': step['type'],
'label': step.get('label', step['type']),
'error': str(e)})
if not step.get('continue_on_error', False):
break
return {
'created': int(time.time()),
'pipeline': pipeline_def.get('name', pipeline_def.get('id', 'custom')),
'steps': steps_output,
'data': [context.get(f'step{len(steps_output)-1}', {})] if steps_output else [],
}
# ---------------------------------------------------------------------------
# CRUD endpoints
# ---------------------------------------------------------------------------
class PipelineStep(BaseModel):
type: str
label: Optional[str] = None
params: Dict[str, Any] = {}
continue_on_error: Optional[bool] = False
model_config = ConfigDict(extra='allow')
class PipelineDefinition(BaseModel):
id: Optional[str] = None
name: str
description: Optional[str] = ''
steps: List[PipelineStep]
model_config = ConfigDict(extra='allow')
class PipelineRunRequest(BaseModel):
input: Optional[str] = ''
model_config = ConfigDict(extra='allow')
@router.get('/v1/pipelines/custom')
async def list_custom_pipelines():
"""List all saved custom pipeline definitions."""
from codai.admin.routes import config_manager
if config_manager is None:
return {'pipelines': []}
return {'pipelines': config_manager.pipelines_data}
@router.get('/v1/pipelines/step-types')
async def list_step_types():
"""List available step types with their parameter schemas."""
return {
'step_types': [
{'type': t, 'label': STEP_TYPE_LABELS[t], 'params': STEP_PARAMS.get(t, [])}
for t in STEP_TYPES
]
}
@router.post('/v1/pipelines/custom')
async def create_custom_pipeline(pipeline: PipelineDefinition):
"""Save a new custom pipeline definition."""
from codai.admin.routes import config_manager
if config_manager is None:
raise HTTPException(status_code=503, detail='Config manager not available')
data = pipeline.model_dump()
if not data.get('id'):
data['id'] = uuid.uuid4().hex[:8]
# Ensure no duplicate id
config_manager.pipelines_data = [p for p in config_manager.pipelines_data if p.get('id') != data['id']]
config_manager.pipelines_data.append(data)
config_manager.save_pipelines()
return {'created': True, 'pipeline': data}
@router.put('/v1/pipelines/custom/{pipeline_id}')
async def update_custom_pipeline(pipeline_id: str, pipeline: PipelineDefinition):
"""Update an existing custom pipeline."""
from codai.admin.routes import config_manager
if config_manager is None:
raise HTTPException(status_code=503, detail='Config manager not available')
data = pipeline.model_dump()
data['id'] = pipeline_id
existing = [p for p in config_manager.pipelines_data if p.get('id') != pipeline_id]
if len(existing) == len(config_manager.pipelines_data):
raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
existing.append(data)
config_manager.pipelines_data = existing
config_manager.save_pipelines()
return {'updated': True, 'pipeline': data}
@router.delete('/v1/pipelines/custom/{pipeline_id}')
async def delete_custom_pipeline(pipeline_id: str):
"""Delete a custom pipeline."""
from codai.admin.routes import config_manager
if config_manager is None:
raise HTTPException(status_code=503, detail='Config manager not available')
before = len(config_manager.pipelines_data)
config_manager.pipelines_data = [p for p in config_manager.pipelines_data if p.get('id') != pipeline_id]
if len(config_manager.pipelines_data) == before:
raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
config_manager.save_pipelines()
return {'deleted': True, 'id': pipeline_id}
@router.post('/v1/pipelines/custom/{pipeline_id}/run')
async def run_custom_pipeline(pipeline_id: str, body: PipelineRunRequest, http_request: Request = None):
"""Execute a saved custom pipeline."""
from codai.admin.routes import config_manager
if config_manager is None:
raise HTTPException(status_code=503, detail='Config manager not available')
pipeline_def = next((p for p in config_manager.pipelines_data if p.get('id') == pipeline_id), None)
if not pipeline_def:
raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
return await _execute_pipeline(pipeline_def, body.input or '', http_request)
@router.post('/v1/pipelines/run')
async def run_inline_pipeline(pipeline: PipelineDefinition, http_request: Request = None):
"""Execute an inline pipeline definition without saving it."""
return await _execute_pipeline(pipeline.model_dump(), '', http_request)
"""
Face swap endpoint.
POST /v1/images/faceswap — swap face in image or video frames
"""
import asyncio
import base64
import io
import os
import subprocess
import tempfile
import time
from typing import Optional
import cv2
import numpy as np
from fastapi import APIRouter, HTTPException, Request
from PIL import Image
from pydantic import BaseModel, ConfigDict
from codai.api.images import save_image_response
router = APIRouter()
global_args = None
global_file_path = None
_INSWAPPER_MODEL_PATH = os.path.expanduser('~/.insightface/models/inswapper_128.onnx')
_INSWAPPER_HF_REPO = 'deepinsight/inswapper'
_INSWAPPER_HF_FILE = 'inswapper_128.onnx'
_face_app = None # FaceAnalysis singleton
_swapper = None # INSwapper singleton
def set_global_args(args):
global global_args
global_args = args
def set_global_file_path(path):
global global_file_path
global_file_path = path
def _ensure_model():
"""Download inswapper_128.onnx if not present."""
if os.path.exists(_INSWAPPER_MODEL_PATH):
return
os.makedirs(os.path.dirname(_INSWAPPER_MODEL_PATH), exist_ok=True)
print(f'Downloading inswapper_128.onnx from HuggingFace…')
try:
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id=_INSWAPPER_HF_REPO,
filename=_INSWAPPER_HF_FILE,
local_dir=os.path.dirname(_INSWAPPER_MODEL_PATH),
)
if path != _INSWAPPER_MODEL_PATH:
import shutil
shutil.move(path, _INSWAPPER_MODEL_PATH)
except Exception as e:
raise RuntimeError(f'Failed to download inswapper model: {e}')
def _get_face_app():
global _face_app
if _face_app is None:
from insightface.app import FaceAnalysis
_face_app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
_face_app.prepare(ctx_id=0, det_size=(640, 640))
return _face_app
def _get_swapper():
global _swapper
if _swapper is None:
_ensure_model()
from insightface.model_zoo import get_model
_swapper = get_model(_INSWAPPER_MODEL_PATH, download=False)
_swapper.prepare(ctx_id=0)
return _swapper
def _decode_image(data: str) -> np.ndarray:
"""Decode base64 or data-URI image to BGR numpy array."""
if data.startswith('data:'):
_, b64 = data.split(',', 1)
data = b64
raw = base64.b64decode(data)
arr = np.frombuffer(raw, np.uint8)
img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
if img is None:
raise ValueError('Could not decode image')
return img
def _swap_faces(source_img: np.ndarray, target_img: np.ndarray) -> np.ndarray:
"""Swap all faces in target_img with the face from source_img."""
app = _get_face_app()
swapper = _get_swapper()
src_faces = app.get(source_img)
if not src_faces:
raise ValueError('No face detected in source image')
src_face = src_faces[0]
tgt_faces = app.get(target_img)
if not tgt_faces:
return target_img # no face to swap in target, return as-is
result = target_img.copy()
for tgt_face in tgt_faces:
result = swapper.get(result, tgt_face, src_face, paste_back=True)
return result
def _decode_b64_or_url(data: str) -> bytes:
if data.startswith('data:'):
_, b64 = data.split(',', 1)
return base64.b64decode(b64)
if data.startswith('http'):
import urllib.request
with urllib.request.urlopen(data, timeout=30) as r:
return r.read()
return base64.b64decode(data)
# ---------------------------------------------------------------------------
# Request model
# ---------------------------------------------------------------------------
class FaceSwapRequest(BaseModel):
source_face: str # base64/data-URI image containing the source face
target: str # base64/data-URI image OR video to swap into
target_type: Optional[str] = 'image' # 'image' or 'video'
response_format: Optional[str] = 'url'
model_config = ConfigDict(extra='allow')
# ---------------------------------------------------------------------------
# Endpoint
# ---------------------------------------------------------------------------
@router.post('/v1/images/faceswap')
async def faceswap(request: FaceSwapRequest, http_request: Request = None):
"""
Swap the face from source_face into every face found in target.
target_type: 'image' (default) or 'video'.
"""
try:
_ensure_model()
except RuntimeError as e:
raise HTTPException(status_code=503, detail=str(e))
try:
src_img = _decode_image(request.source_face)
except Exception as e:
raise HTTPException(status_code=400, detail=f'Invalid source_face: {e}')
if request.target_type == 'video':
return await _faceswap_video(src_img, request, http_request)
else:
return await _faceswap_image(src_img, request, http_request)
async def _faceswap_image(src_img, request, http_request):
try:
tgt_img = _decode_image(request.target)
except Exception as e:
raise HTTPException(status_code=400, detail=f'Invalid target: {e}')
try:
result = await asyncio.get_event_loop().run_in_executor(
None, _swap_faces, src_img, tgt_img)
except ValueError as e:
raise HTTPException(status_code=422, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=f'Face swap failed: {e}')
pil_img = Image.fromarray(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
img_data = save_image_response(pil_img, request.response_format, http_request)
return {'created': int(time.time()), 'data': [img_data]}
async def _faceswap_video(src_img, request, http_request):
raw = _decode_b64_or_url(request.target)
temps = []
try:
# Write input video
in_tmp = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
in_tmp.write(raw); in_tmp.close()
in_path = in_tmp.name
temps.append(in_path)
# Extract frames
frames_dir = tempfile.mkdtemp()
temps.append(frames_dir)
subprocess.run(
['ffmpeg', '-y', '-i', in_path, f'{frames_dir}/%08d.png'],
capture_output=True, check=True)
# Get FPS for reassembly
probe = subprocess.run(
['ffprobe', '-v', 'error', '-select_streams', 'v:0',
'-show_entries', 'stream=r_frame_rate', '-of', 'default=nw=1:nk=1', in_path],
capture_output=True, text=True)
fps_str = probe.stdout.strip() or '25/1'
num, den = fps_str.split('/')
fps = float(num) / float(den)
# Swap faces in each frame
frame_files = sorted(os.listdir(frames_dir))
def _process_frames():
app = _get_face_app()
swapper = _get_swapper()
src_faces = app.get(src_img)
if not src_faces:
raise ValueError('No face detected in source image')
src_face = src_faces[0]
for fname in frame_files:
fpath = os.path.join(frames_dir, fname)
frame = cv2.imread(fpath)
if frame is None:
continue
tgt_faces = app.get(frame)
for tgt_face in tgt_faces:
frame = swapper.get(frame, tgt_face, src_face, paste_back=True)
cv2.imwrite(fpath, frame)
await asyncio.get_event_loop().run_in_executor(None, _process_frames)
# Reassemble video (copy original audio)
out_path = tempfile.mktemp(suffix='_swapped.mp4')
temps.append(out_path)
subprocess.run(
['ffmpeg', '-y', '-framerate', str(fps), '-i', f'{frames_dir}/%08d.png',
'-i', in_path, '-map', '0:v', '-map', '1:a?',
'-c:v', 'libx264', '-c:a', 'copy', '-shortest', out_path],
capture_output=True, check=True)
with open(out_path, 'rb') as f:
out_bytes = f.read()
if global_file_path:
import uuid
fname = f'{uuid.uuid4().hex}_swapped.mp4'
fpath = os.path.join(global_file_path, fname)
os.makedirs(global_file_path, exist_ok=True)
with open(fpath, 'wb') as f:
f.write(out_bytes)
host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
if ':' in host:
parts = host.split(':')
if len(parts) == 2 and parts[1].isdigit():
host = parts[0]
proto = 'https' if getattr(global_args, 'https', False) else 'http'
port = getattr(global_args, 'port', 8000) if global_args else 8000
data = [{'url': f'{proto}://{host}:{port}/v1/files/{fname}'}]
else:
data = [{'b64_mp4': base64.b64encode(out_bytes).decode()}]
return {'created': int(time.time()), 'data': data}
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f'ffmpeg error: {e.stderr.decode()[:200]}')
except ValueError as e:
raise HTTPException(status_code=422, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=f'Video face swap failed: {e}')
finally:
import shutil
for t in temps:
try:
if os.path.isdir(t):
shutil.rmtree(t)
else:
os.unlink(t)
except Exception:
pass
......@@ -21,14 +21,17 @@ Image generation endpoints for the codai API.
import asyncio
import base64
import io
import logging
import os
import time
import uuid
from typing import Optional
from fastapi import APIRouter, HTTPException, Request
_log = logging.getLogger(__name__)
from PIL import Image
from pydantic import BaseModel
from pydantic import BaseModel, ConfigDict
# Import from codai modules
from codai.models.manager import multi_model_manager
......@@ -78,14 +81,12 @@ def get_cfg_scale():
for heap in mem:
if heap.get('flags', []).get('deviceLocal', False):
vram_mb = heap.get('size', 0) / (1024 * 1024)
print(f"DEBUG: Detected VRAM: {vram_mb:.0f} MB")
_log.debug("Detected VRAM: %.0f MB", vram_mb)
if vram_mb < 16000: # Less than 16GB
print(f"DEBUG: VRAM < 16GB, using cfg_scale=1.0 for better performance")
return 1.0
break
except Exception as e:
print(f"DEBUG: Could not detect VRAM: {e}")
# Default to 1.0 for Vulkan if detection fails
_log.debug("Could not detect VRAM: %s", e)
return 1.0
return cfg_scale
......@@ -117,7 +118,6 @@ def save_image_response(img, request_format="base64", http_request=None):
# Add URL to response
# Determine base URL based on --url argument
url_setting = getattr(global_args, 'url', 'auto') if global_args else 'auto'
print(f"DEBUG: global_args={global_args}, url_setting={url_setting}")
if url_setting == 'auto':
# Use server host from request headers (what client used to connect)
if http_request:
......@@ -146,7 +146,6 @@ def save_image_response(img, request_format="base64", http_request=None):
protocol = "https" if use_https else "http"
port = getattr(global_args, 'port', 8000)
base_url = f"{protocol}://{client_host}:{port}"
print(f"DEBUG: client_host={client_host}, port={port}, base_url={base_url}")
else:
base_url = "http://127.0.0.1:8000"
else:
......@@ -460,13 +459,9 @@ def _generate_with_diffusers(pipeline, request, global_args, http_request=None):
raise Exception(f"Could not extract images from diffusers result: {img_err}")
for img in result_images:
# Debug: print image type and value range
print(f"DEBUG: Image type: {type(img)}")
if isinstance(img, np.ndarray):
print(f"DEBUG: Image shape: {img.shape}, dtype: {img.dtype}, min: {img.min()}, max: {img.max()}")
img = np.nan_to_num(img, nan=0.0, posinf=1.0, neginf=0.0)
img = np.clip(img, 0.0, 1.0)
print(f"DEBUG: After NaN handling - min: {img.min()}, max: {img.max()}")
img_data = save_image_response(img, request.response_format, http_request)
images.append(img_data)
......@@ -532,16 +527,27 @@ def _load_sdcpp_model(model_path: str, global_args, model_config: dict = None):
Returns the loaded StableDiffusion model or None.
"""
from stable_diffusion_cpp import StableDiffusion
import stable_diffusion_cpp.stable_diffusion_cpp as sd_cpp
import ctypes
# Check for --no-ram mode
no_ram = getattr(global_args, 'no_ram', False) if global_args else False
print(f"Loading sd.cpp model from: {model_path}")
# Intercept sd.cpp log to detect partial-init failures (e.g. unknown SD version)
log_lines = []
@sd_cpp.sd_log_callback
def _log_cb(level, text, data):
if text:
line = text.decode('utf-8', errors='replace').rstrip()
log_lines.append(line)
sd_cpp.sd_set_log_callback(_log_cb, None)
# Build sd.cpp constructor args from config
kwargs = {
'model_path': model_path,
'offload_params_to_cpu': False, # Use GPU by default
'offload_params_to_cpu': False,
'keep_clip_on_cpu': False,
'keep_control_net_on_cpu': False,
'keep_vae_on_cpu': False,
......@@ -575,6 +581,21 @@ def _load_sdcpp_model(model_path: str, global_args, model_config: dict = None):
sd_model = StableDiffusion(**kwargs)
else:
raise
finally:
# Restore default log callback
sd_cpp.sd_set_log_callback(None, None)
# Check if sd.cpp failed to identify the model architecture.
# In this case new_sd_ctx returns a non-null but broken context that
# will segfault on generate_image — reject it early.
failed_version = any('get sd version from file failed' in l for l in log_lines)
if failed_version:
raise ValueError(
f"sd.cpp could not identify the model architecture in '{model_path}'. "
"This model may require a newer version of stable-diffusion-cpp-python, "
"or it may not be a supported Stable Diffusion GGUF format."
)
return sd_model
......@@ -1278,3 +1299,344 @@ async def create_image_segment(request: ImageSegmentRequest, http_request: Reque
raise HTTPException(status_code=500, detail=f"Segmentation failed: {e}")
result = save_image_response(seg_img, request.response_format, http_request)
return {"created": int(time.time()), "data": [result]}
# =============================================================================
# Deblur Endpoint (POST /v1/images/deblur)
# =============================================================================
class ImageDeblurRequest(BaseModel):
image: str # base64 input image
strength: Optional[float] = 0.5 # 0–1, deblur aggressiveness
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
def _run_deblur(image_bytes: bytes, strength: float) -> "PILImage.Image":
"""Blind deblur using Wiener deconvolution + sharpening."""
import numpy as np
import cv2
from scipy.signal import wiener
from PIL import Image as PILImage
img = PILImage.open(io.BytesIO(image_bytes)).convert("RGB")
arr = np.array(img, dtype=np.float32) / 255.0
# Wiener filter per channel
noise_power = max(0.001, (1.0 - strength) * 0.05)
deblurred = np.stack([
wiener(arr[:, :, c], mysize=5, noise=noise_power)
for c in range(3)
], axis=2)
deblurred = np.clip(deblurred, 0.0, 1.0)
# Unsharp mask pass for edge recovery
blur_sigma = max(0.5, (1.0 - strength) * 2.0)
blurred = cv2.GaussianBlur(deblurred, (0, 0), blur_sigma)
sharpened = cv2.addWeighted(deblurred, 1.0 + strength, blurred, -strength, 0)
sharpened = np.clip(sharpened, 0.0, 1.0)
return PILImage.fromarray((sharpened * 255).astype(np.uint8))
@router.post("/v1/images/deblur")
async def create_image_deblur(request: ImageDeblurRequest, http_request: Request = None):
"""Remove blur from an image using Wiener deconvolution and unsharp masking."""
raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
try:
result_img = await asyncio.get_event_loop().run_in_executor(
None, _run_deblur, raw, request.strength or 0.5)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Deblur failed: {e}")
result = save_image_response(result_img, request.response_format, http_request)
return {"created": int(time.time()), "data": [result]}
# =============================================================================
# Unpixelate Endpoint (POST /v1/images/unpixelate)
# Uses Real-ESRGAN super-resolution — designed exactly for this use case.
# =============================================================================
class ImageUnpixelateRequest(BaseModel):
image: str
scale: Optional[int] = 4 # 2, 4, or 8
model: Optional[str] = None # optional custom Real-ESRGAN model path
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
def _run_unpixelate(image_bytes: bytes, scale: int, model_path: Optional[str]) -> "PILImage.Image":
import numpy as np
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer
import torch
from PIL import Image as PILImage
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if model_path and os.path.exists(model_path):
mp = model_path
else:
# Download RealESRGAN_x4plus on demand
mp = os.path.expanduser('~/.cache/realesrgan/RealESRGAN_x4plus.pth')
if not os.path.exists(mp):
os.makedirs(os.path.dirname(mp), exist_ok=True)
import urllib.request
url = 'https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth'
print(f'Downloading RealESRGAN_x4plus.pth…')
urllib.request.urlretrieve(url, mp)
model_obj = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64,
num_block=23, num_grow_ch=32, scale=4)
upsampler = RealESRGANer(scale=4, model_path=mp, model=model_obj,
half=device.type == 'cuda', device=device)
img = PILImage.open(io.BytesIO(image_bytes)).convert("RGB")
out_arr, _ = upsampler.enhance(np.array(img), outscale=scale)
return PILImage.fromarray(out_arr)
@router.post("/v1/images/unpixelate")
async def create_image_unpixelate(request: ImageUnpixelateRequest, http_request: Request = None):
"""Remove pixelation / upscale with detail recovery using Real-ESRGAN."""
raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
try:
result_img = await asyncio.get_event_loop().run_in_executor(
None, _run_unpixelate, raw, request.scale or 4, request.model)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unpixelate failed: {e}")
result = save_image_response(result_img, request.response_format, http_request)
return {"created": int(time.time()), "data": [result]}
# =============================================================================
# Outfit Change Endpoint (POST /v1/images/outfit)
# Auto-generates a clothing mask via person segmentation, then inpaints.
# =============================================================================
class ImageOutfitRequest(BaseModel):
model: str # inpaint model id
image: Optional[str] = None # base64 source image (image mode)
video: Optional[str] = None # base64 source video (video mode)
prompt: str # description of the new outfit
negative_prompt: Optional[str] = None
mask: Optional[str] = None # optional manual mask (base64); auto-generated if absent
steps: Optional[int] = 30
guidance_scale: Optional[float] = 7.5
strength: Optional[float] = 0.99
seed: Optional[int] = None
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
def _generate_clothing_mask(img_arr) -> "np.ndarray":
"""
Generate a rough clothing mask using GrabCut person segmentation.
Returns a binary mask (255 = clothing area to replace).
"""
import numpy as np
import cv2
h, w = img_arr.shape[:2]
bgr = cv2.cvtColor(img_arr, cv2.COLOR_RGB2BGR)
# GrabCut with a central rect (assumes person is roughly centered)
mask_gc = np.zeros((h, w), np.uint8)
bgd = np.zeros((1, 65), np.float64)
fgd = np.zeros((1, 65), np.float64)
margin_x, margin_y = w // 8, h // 8
rect = (margin_x, margin_y, w - 2 * margin_x, h - 2 * margin_y)
cv2.grabCut(bgr, mask_gc, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)
fg_mask = np.where((mask_gc == cv2.GC_FGD) | (mask_gc == cv2.GC_PR_FGD), 255, 0).astype(np.uint8)
# Exclude top 25% (head/hair) and bottom 10% (feet)
fg_mask[:h // 4, :] = 0
fg_mask[int(h * 0.9):, :] = 0
# Dilate slightly so inpaint covers clothing edges
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15))
fg_mask = cv2.dilate(fg_mask, kernel, iterations=2)
return fg_mask
@router.post("/v1/images/outfit")
async def create_image_outfit(request: ImageOutfitRequest, http_request: Request = None):
"""Change the outfit/clothing in an image or video using inpainting."""
global global_args
if request.video:
return await _outfit_video(request, http_request)
raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
from PIL import Image as PILImage
import numpy as np
img = PILImage.open(io.BytesIO(raw)).convert("RGB")
img_arr = np.array(img)
# Generate or decode mask
if request.mask:
mask_raw = base64.b64decode(request.mask.split(',', 1)[-1] if ',' in request.mask else request.mask)
mask_img = PILImage.open(io.BytesIO(mask_raw)).convert("L")
else:
try:
mask_arr = await asyncio.get_event_loop().run_in_executor(
None, _generate_clothing_mask, img_arr)
mask_img = PILImage.fromarray(mask_arr)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Mask generation failed: {e}")
# Load inpaint pipeline
model_key = f"inpaint:{request.model}"
pipeline = multi_model_manager.models.get(model_key)
if pipeline is None:
try:
pipeline = await asyncio.get_event_loop().run_in_executor(
None, _load_inpaint_pipeline, request.model, global_args)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to load inpaint model: {e}")
multi_model_manager.models[model_key] = pipeline
# Run inpaint
import torch
generator = torch.Generator().manual_seed(request.seed) if request.seed is not None else None
def _run():
kwargs = dict(
prompt=request.prompt,
image=img,
mask_image=mask_img,
num_inference_steps=request.steps or 30,
guidance_scale=request.guidance_scale or 7.5,
strength=request.strength or 0.99,
)
if request.negative_prompt:
kwargs['negative_prompt'] = request.negative_prompt
if generator:
kwargs['generator'] = generator
if hasattr(pipeline, 'safety_checker'):
pipeline.safety_checker = None
return pipeline(**kwargs).images[0]
try:
result_img = await asyncio.get_event_loop().run_in_executor(None, _run)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Outfit change failed: {e}")
result = save_image_response(result_img, request.response_format, http_request)
return {"created": int(time.time()), "data": [result]}
async def _outfit_video(request: ImageOutfitRequest, http_request):
"""Process outfit change frame-by-frame on a video."""
import subprocess
import tempfile
import shutil
raw = base64.b64decode(request.video.split(',', 1)[-1] if ',' in request.video else request.video)
temps = []
try:
in_path = tempfile.mktemp(suffix='.mp4')
temps.append(in_path)
with open(in_path, 'wb') as f:
f.write(raw)
frames_dir = tempfile.mkdtemp()
temps.append(frames_dir)
subprocess.run(['ffmpeg', '-y', '-i', in_path, f'{frames_dir}/%08d.png'],
capture_output=True, check=True)
probe = subprocess.run(
['ffprobe', '-v', 'error', '-select_streams', 'v:0',
'-show_entries', 'stream=r_frame_rate', '-of', 'default=nw=1:nk=1', in_path],
capture_output=True, text=True)
fps_str = probe.stdout.strip() or '25/1'
num, den = fps_str.split('/')
fps = float(num) / float(den)
# Load pipeline once
model_key = f"inpaint:{request.model}"
pipeline = multi_model_manager.models.get(model_key)
if pipeline is None:
pipeline = await asyncio.get_event_loop().run_in_executor(
None, _load_inpaint_pipeline, request.model, global_args)
multi_model_manager.models[model_key] = pipeline
import torch
from PIL import Image as PILImage
import numpy as np
import cv2
generator = torch.Generator().manual_seed(request.seed) if request.seed is not None else None
def _process_frames():
for fname in sorted(os.listdir(frames_dir)):
fpath = os.path.join(frames_dir, fname)
img = PILImage.open(fpath).convert("RGB")
img_arr = np.array(img)
if request.mask:
mask_raw = base64.b64decode(request.mask.split(',', 1)[-1] if ',' in request.mask else request.mask)
mask_img = PILImage.open(io.BytesIO(mask_raw)).convert("L")
else:
mask_arr = _generate_clothing_mask(img_arr)
mask_img = PILImage.fromarray(mask_arr)
kwargs = dict(
prompt=request.prompt,
image=img,
mask_image=mask_img,
num_inference_steps=request.steps or 30,
guidance_scale=request.guidance_scale or 7.5,
strength=request.strength or 0.99,
)
if request.negative_prompt:
kwargs['negative_prompt'] = request.negative_prompt
if generator:
kwargs['generator'] = generator
if hasattr(pipeline, 'safety_checker'):
pipeline.safety_checker = None
result = pipeline(**kwargs).images[0]
result.save(fpath)
await asyncio.get_event_loop().run_in_executor(None, _process_frames)
out_path = tempfile.mktemp(suffix='_outfit.mp4')
temps.append(out_path)
subprocess.run(
['ffmpeg', '-y', '-framerate', str(fps), '-i', f'{frames_dir}/%08d.png',
'-i', in_path, '-map', '0:v', '-map', '1:a?',
'-c:v', 'libx264', '-c:a', 'copy', '-shortest', out_path],
capture_output=True, check=True)
with open(out_path, 'rb') as f:
out_bytes = f.read()
if global_file_path:
fname = f'{uuid.uuid4().hex}_outfit.mp4'
fpath_out = os.path.join(global_file_path, fname)
os.makedirs(global_file_path, exist_ok=True)
with open(fpath_out, 'wb') as f:
f.write(out_bytes)
host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
if ':' in host:
parts = host.split(':')
if len(parts) == 2 and parts[1].isdigit():
host = parts[0]
proto = 'https' if getattr(global_args, 'https', False) else 'http'
port = getattr(global_args, 'port', 8000) if global_args else 8000
data = [{'url': f'{proto}://{host}:{port}/v1/files/{fname}'}]
else:
data = [{'b64_mp4': base64.b64encode(out_bytes).decode()}]
return {'created': int(time.time()), 'data': data}
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f'ffmpeg error: {e.stderr.decode()[:200]}')
except Exception as e:
raise HTTPException(status_code=500, detail=f'Video outfit change failed: {e}')
finally:
for t in temps:
try:
if os.path.isdir(t):
shutil.rmtree(t)
else:
os.unlink(t)
except Exception:
pass
"""
Server-side pipeline endpoints — multi-step generation chains.
POST /v1/pipelines/image-to-video — generate image then animate it
POST /v1/pipelines/story — LLM script → images → video → TTS narration
POST /v1/pipelines/video-dub — transcribe → translate → TTS dub → burn subtitles
POST /v1/pipelines/audio-dub — transcribe audio/video → translate → clone voice → replace audio
"""
import asyncio
import time
from typing import List, Optional
from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel, ConfigDict
router = APIRouter()
# ---------------------------------------------------------------------------
# Helpers — thin wrappers that call the existing endpoint logic directly
# ---------------------------------------------------------------------------
async def _post_json(path: str, body: dict, http_request: Request):
"""Call an internal endpoint by importing its handler directly."""
from codai.api import app as _app_module
from fastapi.testclient import TestClient # only for internal calls
# We avoid HTTP round-trips by calling handlers directly via their routers.
# Import lazily to avoid circular imports.
if path.startswith('/v1/images/generations'):
from codai.api.images import create_image_generation
from codai.pydantic.imagerequest import ImageGenerationRequest
req = ImageGenerationRequest(**body)
return await create_image_generation(req, http_request)
if path.startswith('/v1/video/generations'):
from codai.api.video import create_video_generation
from codai.pydantic.videorequest import VideoGenerationRequest
req = VideoGenerationRequest(**body)
return await create_video_generation(req, http_request)
if path.startswith('/v1/video/dub'):
from codai.api.video import create_video_dub
from codai.pydantic.videorequest import VideoDubRequest
req = VideoDubRequest(**body)
return await create_video_dub(req, http_request)
if path.startswith('/v1/audio/speech'):
from codai.api.tts import create_speech, TTSRequest
req = TTSRequest(**body)
return await create_speech(req)
if path.startswith('/v1/chat/completions'):
from codai.api.text import chat_completions
from codai.pydantic.textrequest import ChatCompletionRequest
req = ChatCompletionRequest(**body)
return await chat_completions(req, http_request)
raise ValueError(f"Unknown internal path: {path}")
def _img_url(result) -> str:
"""Extract URL from an image generation result dict."""
data = result.get('data', [{}])
item = data[0] if data else {}
return item.get('url') or ('data:image/png;base64,' + item['b64_json'] if item.get('b64_json') else None)
def _vid_url(result) -> str:
data = result.get('data', [{}])
item = data[0] if data else {}
return item.get('url') or ('data:video/mp4;base64,' + item['b64_mp4'] if item.get('b64_mp4') else None)
def _aud_url(result) -> str:
if isinstance(result, dict):
if result.get('audio'):
return 'data:audio/mp3;base64,' + result['audio']
data = result.get('data', [{}])
item = data[0] if data else {}
if item.get('url'):
return item['url']
for k, v in item.items():
if k.startswith('b64_'):
return f'data:audio/{k[4:]};base64,{v}'
return None
# ---------------------------------------------------------------------------
# Pipeline 1: Image → Video
# ---------------------------------------------------------------------------
class ImageToVideoPipelineRequest(BaseModel):
prompt: str
image_model: str
video_model: str
# image params
image_size: Optional[str] = "1024x1024"
image_steps: Optional[int] = None
image_cfg: Optional[float] = None
image_seed: Optional[int] = None
negative_prompt: Optional[str] = None
# video params
num_frames: Optional[int] = 16
fps: Optional[int] = 8
num_inference_steps: Optional[int] = 25
guidance_scale: Optional[float] = 7.5
video_seed: Optional[int] = None
camera_motion: Optional[str] = None
# audio
add_audio: Optional[bool] = False
audio_type: Optional[str] = None
audio_prompt: Optional[str] = None
# post
upscale_output: Optional[bool] = False
upscale_factor: Optional[int] = 2
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
@router.post("/v1/pipelines/image-to-video")
async def pipeline_image_to_video(request: ImageToVideoPipelineRequest, http_request: Request = None):
"""Generate an image then animate it into a video."""
steps = []
# Step 1: generate image
img_body = {
"model": request.image_model,
"prompt": request.prompt,
"size": request.image_size,
"response_format": "url",
}
if request.image_steps: img_body["steps"] = request.image_steps
if request.image_cfg: img_body["guidance_scale"] = request.image_cfg
if request.image_seed: img_body["seed"] = request.image_seed
if request.negative_prompt: img_body["negative_prompt"] = request.negative_prompt
try:
img_result = await _post_json('/v1/images/generations', img_body, http_request)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Image generation failed: {e}")
img_url = _img_url(img_result if isinstance(img_result, dict) else img_result.__dict__)
if not img_url:
raise HTTPException(status_code=500, detail="Image generation returned no image")
steps.append({"step": "image", "url": img_url})
# Step 2: animate image → video
vid_body = {
"model": request.video_model,
"mode": "i2v",
"prompt": request.prompt,
"init_image": img_url,
"num_frames": request.num_frames,
"fps": request.fps,
"num_inference_steps": request.num_inference_steps,
"guidance_scale": request.guidance_scale,
"response_format": "url",
}
if request.video_seed: vid_body["seed"] = request.video_seed
if request.camera_motion: vid_body["camera_motion"] = request.camera_motion
if request.add_audio:
vid_body["add_audio"] = True
vid_body["audio_type"] = request.audio_type
vid_body["audio_prompt"] = request.audio_prompt
if request.upscale_output:
vid_body["upscale_output"] = True
vid_body["upscale_factor"] = request.upscale_factor
try:
vid_result = await _post_json('/v1/video/generations', vid_body, http_request)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Video generation failed: {e}")
vid_url = _vid_url(vid_result if isinstance(vid_result, dict) else vid_result.__dict__)
steps.append({"step": "video", "url": vid_url})
return {
"created": int(time.time()),
"pipeline": "image-to-video",
"steps": steps,
"data": [{"url": vid_url, "image_url": img_url}],
}
# ---------------------------------------------------------------------------
# Pipeline 2: Video Dub
# ---------------------------------------------------------------------------
class VideoDubPipelineRequest(BaseModel):
model: str
video: str # base64 or URL
target_lang: str
source_lang: Optional[str] = None
voice_clone: Optional[bool] = False
burn_subtitles: Optional[bool] = True
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
@router.post("/v1/pipelines/video-dub")
async def pipeline_video_dub(request: VideoDubPipelineRequest, http_request: Request = None):
"""Transcribe → translate → TTS dub → burn subtitles."""
body = {
"model": request.model,
"video": request.video,
"target_lang": request.target_lang,
"source_lang": request.source_lang,
"voice_clone": request.voice_clone,
"burn_subtitles": request.burn_subtitles,
"response_format": request.response_format,
}
try:
result = await _post_json('/v1/video/dub', body, http_request)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Video dub failed: {e}")
vid_url = _vid_url(result if isinstance(result, dict) else result.__dict__)
return {
"created": int(time.time()),
"pipeline": "video-dub",
"data": [{"url": vid_url}],
}
# ---------------------------------------------------------------------------
# Pipeline 3: Full Story (LLM → images → video → TTS narration)
# ---------------------------------------------------------------------------
class StoryPipelineRequest(BaseModel):
story: str
text_model: str
image_model: str
video_model: str
tts_model: Optional[str] = None
tts_voice: Optional[str] = "af_sarah"
num_scenes: Optional[int] = 3
num_frames: Optional[int] = 16
fps: Optional[int] = 8
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
@router.post("/v1/pipelines/story")
async def pipeline_story(request: StoryPipelineRequest, http_request: Request = None):
"""LLM generates script → image per scene → animate first scene → optional TTS narration."""
n = min(request.num_scenes or 3, 6)
# Step 1: LLM script
try:
script_result = await _post_json('/v1/chat/completions', {
"model": request.text_model,
"messages": [{"role": "user", "content":
f"Write a {n}-scene visual script for this story. "
f"For each scene write exactly: SCENE X: [brief visual description, one sentence]. "
f"Story: {request.story}"}],
"stream": False,
}, http_request)
if hasattr(script_result, 'body'):
import json
script_result = json.loads(script_result.body)
script_text = script_result['choices'][0]['message']['content']
except Exception as e:
raise HTTPException(status_code=500, detail=f"Script generation failed: {e}")
import re
scenes = re.findall(r'SCENE \d+:\s*(.+)', script_text) or [request.story]
scenes = scenes[:n]
steps = [{"step": "script", "text": script_text, "scenes": scenes}]
# Step 2: image per scene (parallel)
async def _gen_image(desc):
try:
r = await _post_json('/v1/images/generations', {
"model": request.image_model,
"prompt": desc,
"response_format": "url",
}, http_request)
return _img_url(r if isinstance(r, dict) else r.__dict__)
except Exception:
return None
img_urls = await asyncio.gather(*[_gen_image(s) for s in scenes])
img_urls = [u for u in img_urls if u]
steps.append({"step": "images", "urls": img_urls})
if not img_urls:
raise HTTPException(status_code=500, detail="All image generations failed")
# Step 3: animate first scene
try:
vid_result = await _post_json('/v1/video/generations', {
"model": request.video_model,
"mode": "i2v",
"prompt": scenes[0],
"init_image": img_urls[0],
"num_frames": request.num_frames,
"fps": request.fps,
"response_format": "url",
}, http_request)
vid_url = _vid_url(vid_result if isinstance(vid_result, dict) else vid_result.__dict__)
except Exception as e:
vid_url = None
steps.append({"step": "video", "error": str(e)})
else:
steps.append({"step": "video", "url": vid_url})
# Step 4: TTS narration (optional)
aud_url = None
if request.tts_model:
narration = " ".join(scenes)
try:
aud_result = await _post_json('/v1/audio/speech', {
"model": request.tts_model,
"input": narration,
"voice": request.tts_voice or "af_sarah",
"response_format": "mp3",
}, http_request)
aud_url = _aud_url(aud_result if isinstance(aud_result, dict) else aud_result.__dict__)
except Exception as e:
steps.append({"step": "tts", "error": str(e)})
else:
steps.append({"step": "tts", "url": aud_url})
return {
"created": int(time.time()),
"pipeline": "story",
"steps": steps,
"data": [{
"video_url": vid_url,
"image_urls": img_urls,
"audio_url": aud_url,
}],
}
# ---------------------------------------------------------------------------
# Pipeline 4: Audio Dub (transcribe → translate → clone voice → replace audio)
# ---------------------------------------------------------------------------
class AudioDubPipelineRequest(BaseModel):
"""
Dub an audio or video file using a cloned voice.
Steps:
1. Transcribe source audio/video with Whisper
2. Optionally translate the transcript
3. Synthesize dubbed audio with F5-TTS voice cloning
4. If input is video: replace the audio track (ffmpeg)
If input is audio: return the dubbed audio directly
"""
# Input — provide one of:
video: Optional[str] = None # base64/URL video
audio: Optional[str] = None # base64/URL audio-only file
# Voice cloning — provide one of:
voice_name: Optional[str] = None # saved voice profile name
ref_audio: Optional[str] = None # base64 reference audio
ref_text: Optional[str] = None # transcript of ref_audio
# Transcription
source_lang: Optional[str] = None # source language hint (auto-detect if None)
whisper_model: Optional[str] = None # whisper model size (base, small, medium, large)
# Translation
target_lang: Optional[str] = None # translate to this language before dubbing
# if None, dub in original language
# TTS
speed: Optional[float] = 1.0
seed: Optional[int] = None
# Video output options
burn_subtitles: Optional[bool] = False
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
@router.post("/v1/pipelines/audio-dub")
async def pipeline_audio_dub(request: AudioDubPipelineRequest, http_request: Request = None):
"""Transcribe → (translate) → clone voice → replace audio track."""
import os, tempfile, subprocess, base64
if not request.video and not request.audio:
raise HTTPException(status_code=400, detail="Provide video or audio")
if not request.voice_name and not request.ref_audio:
raise HTTPException(status_code=400, detail="Provide voice_name or ref_audio for cloning")
from codai.api.video import _decode_b64_or_url, _tmp_write, _whisper_transcribe, _translate_srt
from codai.api.voice_clone import _load_voice, _decode_audio, _f5tts_clone
temps = []
steps = []
try:
# Decode input
is_video = bool(request.video)
raw = _decode_b64_or_url(request.video or request.audio)
ext = '.mp4' if is_video else '.wav'
in_path = _tmp_write(raw, ext)
temps.append(in_path)
# Step 1: Transcribe
srt_path = await asyncio.get_event_loop().run_in_executor(
None, _whisper_transcribe, in_path, request.source_lang,
request.whisper_model, temps)
if not srt_path:
raise HTTPException(status_code=500, detail="Transcription failed — Whisper not available")
with open(srt_path) as f:
srt_content = f.read()
steps.append({"step": "transcribe", "srt": srt_content})
# Step 2: Translate (optional)
if request.target_lang:
srt_path = await asyncio.get_event_loop().run_in_executor(
None, _translate_srt, srt_path, request.target_lang, temps)
with open(srt_path) as f:
srt_content = f.read()
steps.append({"step": "translate", "lang": request.target_lang, "srt": srt_content})
# Extract plain text from SRT
plain_text = ' '.join(
line.strip() for line in srt_content.split('\n')
if line.strip() and not line.strip()[0].isdigit() and '-->' not in line
)
# Step 3: Resolve reference audio for voice cloning
ref_audio_path = None
ref_text = request.ref_text or ''
if request.voice_name:
meta = _load_voice(request.voice_name)
if not meta:
raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
ref_audio_path = meta['audio_file']
ref_text = ref_text or meta.get('transcript', '')
else:
audio_bytes, aext = _decode_audio(request.ref_audio)
tmp = tempfile.NamedTemporaryFile(suffix=aext, delete=False)
tmp.write(audio_bytes)
tmp.close()
ref_audio_path = tmp.name
temps.append(ref_audio_path)
if not ref_text:
raise HTTPException(status_code=400, detail="ref_text required for voice cloning")
# Step 4: Clone voice
try:
dubbed_bytes = await asyncio.get_event_loop().run_in_executor(
None, _f5tts_clone,
ref_audio_path, ref_text, plain_text,
request.speed or 1.0, request.seed,
)
except ImportError:
raise HTTPException(status_code=501, detail="f5-tts not installed. Run: pip install f5-tts")
dubbed_path = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
dubbed_path.write(dubbed_bytes)
dubbed_path.close()
dubbed_path = dubbed_path.name
temps.append(dubbed_path)
steps.append({"step": "clone_voice"})
# Step 5: Replace audio / return
if is_video:
out_path = tempfile.mktemp(suffix='_dubbed.mp4')
temps.append(out_path)
cmd = ['ffmpeg', '-y', '-i', in_path, '-i', dubbed_path,
'-map', '0:v', '-map', '1:a',
'-c:v', 'copy', '-c:a', 'aac', '-shortest', out_path]
r = subprocess.run(cmd, capture_output=True)
if r.returncode != 0:
raise HTTPException(status_code=500, detail=f"Audio merge failed: {r.stderr.decode()}")
if request.burn_subtitles:
sub_out = tempfile.mktemp(suffix='_sub.mp4')
temps.append(sub_out)
r2 = subprocess.run(
['ffmpeg', '-y', '-i', out_path, '-vf', f'subtitles={srt_path}',
'-c:a', 'copy', sub_out], capture_output=True)
if r2.returncode == 0:
out_path = sub_out
with open(out_path, 'rb') as f:
out_bytes = f.read()
out_b64 = base64.b64encode(out_bytes).decode()
steps.append({"step": "merge_video"})
result_data = [{"b64_mp4": out_b64}]
else:
out_b64 = base64.b64encode(dubbed_bytes).decode()
result_data = [{"b64_wav": out_b64}]
# Save to file path if configured
if http_request:
from codai.api.voice_clone import _save_audio_response
# reuse save logic for the output
pass
return {
"created": int(time.time()),
"pipeline": "audio-dub",
"steps": steps,
"data": result_data,
}
finally:
for t in temps:
try:
os.unlink(t)
except Exception:
pass
# CoderAI - OpenAI-compatible API server
# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
"""Simple in-process token-bucket rate limiter middleware.
Each distinct (client-IP, route-prefix) pair gets its own bucket.
Limits are configured via RateLimitConfig. The defaults below are
intentionally generous; tighten them through the config file or CLI.
Endpoints covered:
/v1/chat/completions — expensive LLM inference
/v1/images/ — image generation
/v1/audio/ — TTS / STT / audio generation
/v1/video/ — video generation
/v1/embeddings — embedding
/v1/completions — legacy completions
"""
import time
import threading
from collections import defaultdict
from typing import Dict, Tuple
from fastapi import Request, Response
from fastapi.responses import JSONResponse
from starlette.middleware.base import BaseHTTPMiddleware
# Per-route-prefix defaults: (max_requests, window_seconds)
_DEFAULT_LIMITS: Dict[str, Tuple[int, int]] = {
"/v1/chat/completions": (60, 60),
"/v1/completions": (60, 60),
"/v1/images/": (30, 60),
"/v1/audio/": (60, 60),
"/v1/video/": (10, 60),
"/v1/embeddings": (120, 60),
}
# API prefixes that count against the request queue
_QUEUED_PREFIXES = ("/v1/",)
# Global toggle — set to False to disable rate limiting entirely.
RATE_LIMITING_ENABLED = True
class _Bucket:
"""Fixed-window counter."""
__slots__ = ("count", "window_start")
def __init__(self, now: float):
self.count = 0
self.window_start = now
class RateLimitMiddleware(BaseHTTPMiddleware):
"""Apply per-IP, per-route-prefix rate limiting to API endpoints."""
def __init__(self, app, limits: Dict[str, Tuple[int, int]] = None):
super().__init__(app)
self._limits = limits or _DEFAULT_LIMITS
# (client_ip, prefix) → _Bucket
self._buckets: Dict[Tuple[str, str], _Bucket] = defaultdict(lambda: _Bucket(time.monotonic()))
self._lock = threading.Lock()
def _get_prefix(self, path: str) -> str:
for prefix in self._limits:
if path.startswith(prefix):
return prefix
return ""
async def dispatch(self, request: Request, call_next):
if not RATE_LIMITING_ENABLED:
return await call_next(request)
path = request.url.path
# Queue-size enforcement for authenticated API requests
if any(path.startswith(p) for p in _QUEUED_PREFIXES):
from codai.queue.manager import queue_manager
if await queue_manager.is_full():
return JSONResponse(
status_code=429,
content={
"error": {
"message": "Server queue is full. Please retry later.",
"type": "rate_limit_error",
"code": 429,
}
},
headers={"Retry-After": "5"},
)
prefix = self._get_prefix(path)
if not prefix:
return await call_next(request)
max_req, window = self._limits[prefix]
client_ip = (
request.headers.get("x-forwarded-for", "").split(",")[0].strip()
or (request.client.host if request.client else "unknown")
)
key = (client_ip, prefix)
now = time.monotonic()
with self._lock:
bucket = self._buckets[key]
if now - bucket.window_start >= window:
bucket.count = 0
bucket.window_start = now
bucket.count += 1
count = bucket.count
remaining = max(0, max_req - count)
reset_at = int(time.time() + (window - (now - self._buckets[key].window_start)))
if count > max_req:
return JSONResponse(
status_code=429,
content={
"error": {
"message": "Rate limit exceeded. Please slow down.",
"type": "rate_limit_error",
"code": 429,
}
},
headers={
"X-RateLimit-Limit": str(max_req),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": str(reset_at),
"Retry-After": str(window),
},
)
response = await call_next(request)
response.headers["X-RateLimit-Limit"] = str(max_req)
response.headers["X-RateLimit-Remaining"] = str(remaining)
response.headers["X-RateLimit-Reset"] = str(reset_at)
return response
......@@ -20,12 +20,15 @@ Text generation endpoints for the codai API.
import asyncio
import json
import logging
import time
import uuid
from typing import AsyncGenerator, Dict, List, Optional
from fastapi import APIRouter, HTTPException, Request
logger = logging.getLogger(__name__)
# Import from codai modules
from codai.models.manager import ModelManager, WhisperServerManager, MultiModelManager, model_manager, multi_model_manager
from codai.queue.manager import QueueManager, queue_manager
......@@ -119,68 +122,47 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
if auth_header.startswith('Bearer '):
api_key = auth_header[7:] # Extract token after 'Bearer '
# If still no API key, use a fake key to allow litellm to proceed
# litellm will then fail with the actual provider error if needed
if not api_key:
api_key = "fake-key-for-local-testing"
print("DEBUG: No API key provided, using fake key for litellm")
raise HTTPException(
status_code=401,
detail="An API key is required for the LiteLLM backend. "
"Provide an 'Authorization: Bearer <key>' header.",
)
# Determine the base URL for litellm to connect to
# Use the server's host and port for local connections
api_base = None
# Check if model starts with 'ollama:' - use local Ollama
if request.model and request.model.startswith('ollama:'):
# Get the host from the request headers
client_host = "127.0.0.1"
if http_request:
host_header = http_request.headers.get('host', '')
if host_header:
# Strip port if present
if ':' in host_header:
client_host = host_header.split(':')[0]
if client_host.replace('.', '').isdigit():
# It's an IP, keep it
pass
else:
# It's a hostname, use localhost
client_host = "127.0.0.1"
else:
client_host = host_header
# Get port from global_args or use default
port = getattr(global_args, 'port', 11434) if global_args else 11434
api_base = f"http://{client_host}:{port}/v1"
print(f"DEBUG: Using api_base for Ollama: {api_base}")
else:
# For non-Ollama models, use the server's own URL as base
# This allows LiteLLM to make requests to the local server
if http_request:
# Get the host from the request headers
host_header = http_request.headers.get('host', '')
if host_header:
# Strip port if present to reconstruct clean URL
if ':' in host_header:
client_host = host_header.split(':')[0]
# Keep the port from the request for consistency
server_port = host_header.split(':')[1] if len(host_header.split(':')) > 1 else str(getattr(global_args, 'port', 6745))
parts = host_header.split(':')
client_host = parts[0]
server_port = parts[1] if len(parts) > 1 else str(getattr(global_args, 'port', 6745))
else:
client_host = host_header
server_port = str(getattr(global_args, 'port', 6745))
else:
# Fallback to client host if no Host header
client_host = http_request.client.host if http_request.client else "127.0.0.1"
server_port = str(getattr(global_args, 'port', 6745))
else:
# Fallback if no http_request
client_host = "127.0.0.1"
server_port = str(getattr(global_args, 'port', 6745))
# Determine protocol (http or https)
use_https = getattr(global_args, 'https', False) or getattr(global_args, 'pubkey', None)
protocol = "https" if use_https else "http"
api_base = f"{protocol}://{client_host}:{server_port}/v1"
print(f"DEBUG: Using api_base for local server: {api_base}")
# Get or create litellm backend
litellm_backend = get_litellm_backend(
......@@ -228,33 +210,21 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
stream=True,
tool_parser=tool_parser,
):
# Add rate limit headers
headers = {}
if 'usage' in chunk:
headers = litellm_backend.get_rate_limit_headers(
prompt_tokens=chunk.get('usage', {}).get('prompt_tokens', 0),
completion_tokens=chunk.get('usage', {}).get('completion_tokens', 0)
)
# Handle Qwen tool calls if model is Qwen family
if 'qwen' in request.model.lower():
content = chunk.get('choices', [{}])[0].get('delta', {}).get('content', '')
tool_calls = chunk.get('choices', [{}])[0].get('delta', {}).get('tool_calls', [])
if not tool_calls and content:
# Try to parse tool calls from content
tool_calls = litellm_backend.parse_qwen_tool_calls(content)
if tool_calls:
# Strip tool tags from content
content = litellm_backend.strip_tool_tags(content)
chunk['choices'][0]['delta']['content'] = content
chunk['choices'][0]['delta']['tool_calls'] = tool_calls
yield f"data: {json.dumps(chunk)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
# Send error chunk then [DONE] so clients don't hang waiting
yield f"data: {json.dumps({'error': {'message': str(e), 'type': 'internal_error'}})}\n\n"
yield "data: [DONE]\n\n"
from fastapi.responses import StreamingResponse
return StreamingResponse(generate(), media_type="text/event-stream")
......@@ -586,10 +556,6 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
elif not isinstance(m["content"], str):
messages_dict[i]["content"] = str(m["content"])
# Debug: print first few messages to see their structure
print(f"DEBUG: messages_dict[0] keys: {list(messages_dict[0].keys()) if messages_dict else 'empty'}")
if len(messages_dict) > 1:
print(f"DEBUG: messages_dict[1] keys: {list(messages_dict[1].keys()) if len(messages_dict) > 1 else 'empty'}")
# Convert tools to dict format if present
tools_dict = None
......@@ -650,10 +616,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
if get_global_debug():
print(f"RAW: template_manager.format_for_raw_completion not available")
# Get resolved model name for response (with coderai/ prefix and proper formatting)
# Use multi_model_manager to get the actual loaded models, not the individual model manager
response_model_name = get_resolved_model_name(requested_model, multi_model_manager)
print(f"DEBUG: Requested model: {requested_model}, Resolved model for response: {response_model_name}")
# Handle raw mode - two pass: first capture reasoning, then get final answer
if use_raw_mode and raw_prompt_for_generation:
......@@ -813,7 +776,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
)
tools_list.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
except Exception as e:
print(f"DEBUG: Error converting tool in raw stream: {e}")
logger.debug("Error converting tool in raw stream: %s", e)
continue
if tools_list:
......@@ -1014,7 +977,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
)
tools_list.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
except Exception as e:
print(f"DEBUG: Error converting tool in raw mode: {e}, tool type: {type(t)}")
logger.debug("Error converting tool in raw mode: %s (type: %s)", e, type(t))
continue
# Step 1: Use ModelParserAdapter to extract tool calls from final_text (NOT generated_text which includes reasoning)
......@@ -1040,7 +1003,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
validated_calls.append(tc)
if len(validated_calls) != len(extracted_tool_calls):
print(f"DEBUG: Filtered out {len(extracted_tool_calls) - len(validated_calls)} invalid tool calls in non-streaming")
logger.debug("Filtered out %d invalid tool calls in non-streaming", len(extracted_tool_calls) - len(validated_calls))
extracted_tool_calls = validated_calls if validated_calls else None
if extracted_tool_calls:
......@@ -1213,7 +1176,6 @@ async def stream_chat_response(
request_id = f"req-{uuid.uuid4().hex[:8]}"
generated_text = ""
print(f"DEBUG: stream_chat_response started, stream=True, tools={tools is not None}")
# Check if model is loaded - if not, notify waiting clients
# The model manager exists but backend may not be loaded yet in on-demand mode
......@@ -1365,9 +1327,6 @@ async def stream_chat_response(
# Explicitly flush to ensure data is sent immediately
await asyncio.sleep(0)
print(f"DEBUG: stream_chat_response completed, {chunk_count} chunks, generated_text length: {len(generated_text)}")
if not generated_text.strip():
print(f"DEBUG: Warning - no content generated!")
# In debug mode, dump the full generated text
if get_global_debug():
......@@ -1407,7 +1366,7 @@ async def stream_chat_response(
)
tool_objects.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
except Exception as e:
print(f"DEBUG: Error converting tool: {e}, tool type: {type(t)}")
logger.debug("Error converting tool: %s (type: %s)", e, type(t))
continue
try:
tool_calls = tool_parser.extract_tool_calls(generated_text, tool_objects)
......@@ -1423,10 +1382,10 @@ async def stream_chat_response(
elif isinstance(args, dict):
validated_calls.append(tc)
if len(validated_calls) != len(tool_calls):
print(f"DEBUG: Filtered out {len(tool_calls) - len(validated_calls)} invalid tool calls in stream_chat_response")
logger.debug("Filtered out %d invalid tool calls in stream_chat_response", len(tool_calls) - len(validated_calls))
tool_calls = validated_calls if validated_calls else None
except Exception as e:
print(f"DEBUG: Error extracting tool calls: {e}")
logger.debug("Error extracting tool calls: %s", e)
tool_calls = None
if tool_calls:
# In debug mode, dump tool calls
......@@ -1628,7 +1587,7 @@ async def generate_chat_response(
)
tool_objects.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
except Exception as e:
print(f"DEBUG: Error converting tool: {e}, tool type: {type(t)}")
logger.debug("Error converting tool: %s (type: %s)", e, type(t))
continue
try:
tool_calls = tool_parser.extract_tool_calls(generated_text, tool_objects)
......@@ -1644,10 +1603,10 @@ async def generate_chat_response(
elif isinstance(args, dict):
validated_calls.append(tc)
if len(validated_calls) != len(tool_calls):
print(f"DEBUG: Filtered out {len(tool_calls) - len(validated_calls)} invalid tool calls in generate_chat_response")
logger.debug("Filtered out %d invalid tool calls in generate_chat_response", len(tool_calls) - len(validated_calls))
tool_calls = validated_calls if validated_calls else None
except Exception as e:
print(f"DEBUG: Error extracting tool calls: {e}")
logger.debug("Error extracting tool calls: %s", e)
tool_calls = None
if tool_calls:
# Always strip tool call format from content
......
......@@ -23,8 +23,15 @@ import os
import tempfile
from fastapi import APIRouter, HTTPException, UploadFile, File, Form
from fastapi.responses import PlainTextResponse
from typing import Optional
# Maximum upload size: 100 MB
_MAX_AUDIO_BYTES = 100 * 1024 * 1024
# Safe audio extensions (user-supplied extension is NOT trusted for the suffix)
_SAFE_EXTENSIONS = {'.wav', '.mp3', '.ogg', '.flac', '.m4a', '.webm', '.mp4'}
# Import from codai modules
from codai.models.manager import multi_model_manager
......@@ -39,6 +46,71 @@ def set_global_args(args):
global_args = args
# =============================================================================
# Response formatting helpers
# =============================================================================
def _seconds_to_srt_time(s: float) -> str:
h = int(s // 3600)
m = int((s % 3600) // 60)
sec = s % 60
return f"{h:02d}:{m:02d}:{sec:06.3f}".replace('.', ',')
def _seconds_to_vtt_time(s: float) -> str:
h = int(s // 3600)
m = int((s % 3600) // 60)
sec = s % 60
return f"{h:02d}:{m:02d}:{sec:06.3f}"
def _format_response(fmt: str, text: str, segments: list):
"""Format a transcription result according to the requested response_format."""
fmt = (fmt or "json").lower()
if fmt == "text":
return PlainTextResponse(text)
if fmt == "srt":
lines = []
for i, seg in enumerate(segments, 1):
start = _seconds_to_srt_time(seg.get("start", 0))
end = _seconds_to_srt_time(seg.get("end", 0))
lines.append(f"{i}\n{start} --> {end}\n{seg['text'].strip()}\n")
srt_body = "\n".join(lines) if lines else f"1\n00:00:00,000 --> 00:00:00,000\n{text}\n"
return PlainTextResponse(srt_body, media_type="text/plain")
if fmt == "vtt":
lines = ["WEBVTT\n"]
for seg in segments:
start = _seconds_to_vtt_time(seg.get("start", 0))
end = _seconds_to_vtt_time(seg.get("end", 0))
lines.append(f"{start} --> {end}\n{seg['text'].strip()}\n")
if not segments:
lines.append(f"00:00:00.000 --> 00:00:00.000\n{text}\n")
return PlainTextResponse("\n".join(lines), media_type="text/vtt")
if fmt == "verbose_json":
return {
"task": "transcribe",
"language": "unknown",
"duration": segments[-1].get("end", 0) if segments else 0,
"text": text,
"segments": [
{
"id": i,
"start": s.get("start", 0),
"end": s.get("end", 0),
"text": s.get("text", "").strip(),
}
for i, s in enumerate(segments)
],
}
# Default: json
return {"text": text}
# =============================================================================
# Router and Endpoints
# =============================================================================
......@@ -58,17 +130,37 @@ async def create_transcription(
"""
Audio transcription endpoint (OpenAI-compatible).
"""
# Check if whisper-server is available FIRST
if multi_model_manager.whisper_server and multi_model_manager.whisper_server.is_running():
file_content = await file.read()
result = multi_model_manager.whisper_server.transcribe(
file_content,
language=language,
prompt=prompt
)
if len(file_content) > _MAX_AUDIO_BYTES:
raise HTTPException(status_code=413, detail="Audio file too large (max 100 MB)")
# Check if the requested model is a whisper-server instance
wsm = multi_model_manager.whisper_servers.get(model)
if wsm is None and multi_model_manager.whisper_server is not None:
# Legacy single-instance fallback: use it if no specific match
if not multi_model_manager.whisper_servers:
wsm = multi_model_manager.whisper_server
if wsm is not None:
ws_key = f"audio:{model}" if model in multi_model_manager.whisper_servers else "audio:whisper-server"
# Let the VRAM manager evict other models if needed
multi_model_manager.request_model(requested_model=model, model_type="audio")
# Start the subprocess if it isn't running (on-demand)
if not wsm.is_running():
wsm.start(getattr(wsm, '_model_path', None), gpu_device=getattr(wsm, '_gpu_device', 0))
if wsm.is_running():
multi_model_manager.models[ws_key] = wsm
multi_model_manager.active_in_vram = ws_key
multi_model_manager.models_in_vram.add(ws_key)
if wsm.is_running():
result = wsm.transcribe(file_content, language=language, prompt=prompt)
if "error" in result:
raise HTTPException(status_code=500, detail=result["error"])
return {"text": result.get("text", "")}
return _format_response(response_format, result.get("text", ""), [])
# Fall through to Python backends if subprocess failed to start
# Use the manager to resolve the model and manage VRAM
model_info = multi_model_manager.request_model(
......@@ -90,11 +182,13 @@ async def create_transcription(
detail="Audio transcription not configured. Use --audio-model or --whisper-server."
)
# Read the uploaded file
file_content = await file.read()
# Determine a safe file extension from the upload's content-type or filename,
# never trusting the raw user-supplied value for arbitrary suffixes.
raw_ext = os.path.splitext(file.filename or '')[1].lower()
safe_ext = raw_ext if raw_ext in _SAFE_EXTENSIONS else '.wav'
# Save to temp file (needed for some backends)
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as tmp:
with tempfile.NamedTemporaryFile(delete=False, suffix=safe_ext) as tmp:
tmp.write(file_content)
tmp_path = tmp.name
......@@ -104,41 +198,27 @@ async def create_transcription(
from faster_whisper import WhisperModel
if whisper_model is None:
print(f"Loading faster-whisper model: {model_name}")
# Determine compute type - always use int8 for CPU
compute_type = "int8"
# Load the model
whisper_model = WhisperModel(
model_name,
device="cpu", # Always use CPU - faster-whisper CUDA doesn't work with AMD
compute_type=compute_type,
device="cpu",
compute_type="int8",
)
# Cache the model
multi_model_manager.add_model(model_key, whisper_model)
multi_model_manager.current_model_key = model_key
print(f"Loaded faster-whisper model: {model_name}")
# Run transcription
segments, info = whisper_model.transcribe(
raw_segments, _ = whisper_model.transcribe(
tmp_path,
language=language,
initial_prompt=prompt,
temperature=temperature,
)
# Collect all segments
text_parts = []
for segment in segments:
text_parts.append(segment.text)
full_text = "".join(text_parts)
return {
"text": full_text.strip()
}
# Materialise the generator so we have all segment data
segments = [
{"start": s.start, "end": s.end, "text": s.text}
for s in raw_segments
]
full_text = "".join(s["text"] for s in segments)
return _format_response(response_format, full_text.strip(), segments)
except ImportError:
pass
......@@ -148,41 +228,26 @@ async def create_transcription(
import whispercpp
if whisper_model is None:
print(f"Loading whispercpp model: {model_name}")
# Check if it's a built-in model name
if model_name in ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large']:
# It's a built-in model name
whisper_model = whispercpp.Whisper.from_pretrained(model_name)
else:
# It's a path to a GGUF file
whisper_model = whispercpp.Whisper.from_pretrained(model_name)
# Cache the model
multi_model_manager.add_model(model_key, whisper_model)
multi_model_manager.current_model_key = model_key
print(f"Loaded whispercpp model: {model_name}")
# Run transcription
result = whisper_model.transcribe(tmp_path)
# Extract text from result
text = ""
if hasattr(result, 'text'):
text = result.text
elif isinstance(result, dict):
text = result.get('text', '')
elif isinstance(result, list):
# Some versions return a list of segments
for segment in result:
if hasattr(segment, 'text'):
text += segment.text
elif isinstance(segment, dict):
text += segment.get('text', '')
return {
"text": text.strip()
}
# whispercpp does not expose per-segment timestamps easily
return _format_response(response_format, text.strip(), [])
except ImportError as e:
raise HTTPException(
......
......@@ -263,7 +263,39 @@ def _apply_camera_motion(kw: dict, camera_motion: str):
kw['camera_motion'] = camera_motion
def _apply_character_refs(kw: dict, character_references: List[str], strength: float):
def _resolve_character_inputs(request) -> tuple[List[str], List[str]]:
"""Return (flat_image_list, name_list) from any combination of request fields."""
images: List[str] = []
names: List[str] = []
# 1. Expand named saved profiles
if request.character_profiles:
try:
from codai.api.characters import resolve_character_profiles
images += resolve_character_profiles(request.character_profiles)
names += list(request.character_profiles)
except Exception:
pass
# 2. Named character slots [{name, images:[...]}, ...]
if request.characters:
for slot in request.characters:
slot_imgs = slot.get('images') or []
images += slot_imgs
if slot.get('name'):
names.append(slot['name'])
# 3. Legacy flat list
if request.character_references:
images += list(request.character_references)
if request.character_names:
names += list(request.character_names)
return images, names
def _apply_character_refs(kw: dict, character_references: List[str], strength: float,
names: Optional[List[str]] = None):
"""Apply character reference images to pipeline kwargs."""
if not character_references:
return
......@@ -291,8 +323,13 @@ def _generate_video(pipe, request: VideoGenerationRequest):
_apply_camera_motion(kw, request.camera_motion)
if request.character_references:
_apply_character_refs(kw, request.character_references, request.character_strength or 0.8)
char_images, char_names = _resolve_character_inputs(request)
if char_images:
_apply_character_refs(kw, char_images, request.character_strength or 0.8, char_names)
# Prepend character names to prompt for better conditioning
if char_names and kw.get('prompt'):
names_hint = ', '.join(char_names)
kw['prompt'] = f"{names_hint}. {kw['prompt']}"
init_src = request.init_image or request.image
......@@ -359,35 +396,49 @@ def _ffmpeg_upscale(path: str, factor: int, temps: list) -> str:
scale = f"scale=iw*{factor}:ih*{factor}:flags=lanczos"
cmd = ['ffmpeg', '-y', '-i', path, '-vf', scale, '-c:a', 'copy', out]
r = subprocess.run(cmd, capture_output=True)
if r.returncode == 0:
return out
if r.returncode != 0:
import logging
logging.getLogger(__name__).warning(
"ffmpeg upscale failed (rc=%d): %s", r.returncode, r.stderr.decode(errors='replace')
)
return path # fallback to original if ffmpeg fails
return out
def _rife_interpolate(path: str, multiplier: int, temps: list) -> str:
out = tempfile.mktemp(suffix='_rife.mp4')
temps.append(out)
# Try rife-ncnn-vulkan binary if available
import shutil
import logging, shutil
_log = logging.getLogger(__name__)
if shutil.which('rife-ncnn-vulkan'):
frames_dir = tempfile.mkdtemp()
out_dir = tempfile.mkdtemp()
temps += [frames_dir, out_dir]
subprocess.run(['ffmpeg', '-y', '-i', path, f'{frames_dir}/%08d.png'],
r = subprocess.run(['ffmpeg', '-y', '-i', path, f'{frames_dir}/%08d.png'],
capture_output=True)
subprocess.run(['rife-ncnn-vulkan', '-i', frames_dir, '-o', out_dir,
'-m', f'rife-v4'], capture_output=True)
subprocess.run(['ffmpeg', '-y', '-r', str(multiplier * 8), '-i',
if r.returncode != 0:
_log.warning("ffmpeg frame extraction failed: %s", r.stderr.decode(errors='replace'))
else:
r = subprocess.run(['rife-ncnn-vulkan', '-i', frames_dir, '-o', out_dir,
'-m', 'rife-v4'], capture_output=True)
if r.returncode != 0:
_log.warning("rife-ncnn-vulkan failed: %s", r.stderr.decode(errors='replace'))
else:
r = subprocess.run(['ffmpeg', '-y', '-r', str(multiplier * 8), '-i',
f'{out_dir}/%08d.png', '-c:v', 'libx264', out],
capture_output=True)
if os.path.exists(out):
if r.returncode != 0:
_log.warning("ffmpeg reassembly failed: %s", r.stderr.decode(errors='replace'))
elif os.path.exists(out):
return out
# Simple ffmpeg minterpolate fallback
fps_expr = f"fps=fps={multiplier}*source_fps"
cmd = ['ffmpeg', '-y', '-i', path, '-filter:v',
f'minterpolate=fps={multiplier * 8}', '-c:a', 'copy', out]
r = subprocess.run(cmd, capture_output=True)
return out if r.returncode == 0 else path
if r.returncode != 0:
_log.warning("ffmpeg minterpolate failed: %s", r.stderr.decode(errors='replace'))
return path
return out
def _add_audio_to_video(path: str, request: VideoGenerationRequest,
......
"""
Voice cloning endpoints.
POST /v1/audio/clone — synthesize speech in a cloned voice
GET /v1/audio/voices — list saved voice profiles
POST /v1/audio/voices — save a named voice profile (ref audio + transcript)
DELETE /v1/audio/voices/{name} — delete a voice profile
"""
import asyncio
import base64
import io
import json
import os
import tempfile
import time
from typing import Optional
from fastapi import APIRouter, HTTPException, Request, UploadFile, File, Form
from pydantic import BaseModel, ConfigDict
router = APIRouter()
global_args = None
global_file_path = None
# Directory where voice profiles are stored
_VOICES_DIR: Optional[str] = None
def set_global_args(args):
global global_args, _VOICES_DIR
global_args = args
# Store voice profiles alongside output files, or in a default location
base = getattr(args, 'file_path', None) or os.path.expanduser('~/.coderai/voices')
_VOICES_DIR = os.path.join(base if os.path.isdir(base) else os.path.dirname(base) if base else os.path.expanduser('~/.coderai'), 'voices')
os.makedirs(_VOICES_DIR, exist_ok=True)
def set_global_file_path(path):
global global_file_path
global_file_path = path
def _voices_dir() -> str:
if _VOICES_DIR:
return _VOICES_DIR
d = os.path.expanduser('~/.coderai/voices')
os.makedirs(d, exist_ok=True)
return d
def _voice_path(name: str) -> str:
return os.path.join(_voices_dir(), name)
def _list_voices() -> list:
d = _voices_dir()
voices = []
for entry in os.scandir(d):
if entry.is_dir():
meta_path = os.path.join(entry.path, 'meta.json')
if os.path.exists(meta_path):
with open(meta_path) as f:
meta = json.load(f)
voices.append(meta)
return sorted(voices, key=lambda v: v.get('created_at', 0))
def _save_voice(name: str, audio_bytes: bytes, audio_ext: str, transcript: str, description: str = '') -> dict:
vdir = _voice_path(name)
os.makedirs(vdir, exist_ok=True)
audio_file = os.path.join(vdir, f'ref{audio_ext}')
with open(audio_file, 'wb') as f:
f.write(audio_bytes)
meta = {
'name': name,
'description': description,
'transcript': transcript,
'audio_file': audio_file,
'audio_ext': audio_ext,
'created_at': int(time.time()),
}
with open(os.path.join(vdir, 'meta.json'), 'w') as f:
json.dump(meta, f)
return meta
def _load_voice(name: str) -> Optional[dict]:
meta_path = os.path.join(_voice_path(name), 'meta.json')
if not os.path.exists(meta_path):
return None
with open(meta_path) as f:
return json.load(f)
def _decode_audio(data: str) -> tuple[bytes, str]:
"""Decode base64 audio data, return (bytes, ext)."""
if data.startswith('data:'):
mime, b64 = data.split(',', 1)
ext = '.' + mime.split('/')[1].split(';')[0]
return base64.b64decode(b64), ext
return base64.b64decode(data), '.wav'
def _f5tts_clone(ref_audio_path: str, ref_text: str, gen_text: str,
speed: float = 1.0, seed: Optional[int] = None) -> bytes:
"""Run F5-TTS voice cloning, return WAV bytes."""
from f5_tts.api import F5TTS
import soundfile as sf
import numpy as np
device = None
if global_args:
import torch
if torch.cuda.is_available():
device = 'cuda'
tts = F5TTS(device=device)
wav, sr, _ = tts.infer(
ref_file=ref_audio_path,
ref_text=ref_text,
gen_text=gen_text,
speed=speed,
seed=seed,
show_info=lambda x: None,
progress=lambda x, **kw: x,
)
buf = io.BytesIO()
sf.write(buf, wav, sr, format='WAV')
return buf.getvalue()
def _save_audio_response(audio_bytes: bytes, http_request: Request) -> dict:
import uuid
filename = f"{uuid.uuid4().hex}.wav"
if global_file_path:
os.makedirs(global_file_path, exist_ok=True)
fpath = os.path.join(global_file_path, filename)
with open(fpath, 'wb') as f:
f.write(audio_bytes)
host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
if ':' in host:
parts = host.split(':')
if len(parts) == 2 and parts[1].isdigit():
host = parts[0]
use_https = getattr(global_args, 'https', False) if global_args else False
proto = 'https' if use_https else 'http'
port = getattr(global_args, 'port', 8000) if global_args else 8000
return {"url": f"{proto}://{host}:{port}/v1/files/{filename}"}
return {"b64_wav": base64.b64encode(audio_bytes).decode()}
# ---------------------------------------------------------------------------
# Voice profile management
# ---------------------------------------------------------------------------
@router.get("/v1/audio/voices")
async def list_voices():
"""List all saved voice profiles."""
return {"voices": _list_voices()}
@router.post("/v1/audio/voices")
async def create_voice(
name: str = Form(...),
transcript: str = Form(...),
description: str = Form(''),
audio: UploadFile = File(...),
):
"""Save a named voice profile from a reference audio file + transcript."""
if not name.replace('-', '').replace('_', '').isalnum():
raise HTTPException(status_code=400, detail="Voice name must be alphanumeric (hyphens/underscores allowed)")
audio_bytes = await audio.read()
ext = os.path.splitext(audio.filename)[1] or '.wav'
# Validate audio is readable
try:
import soundfile as sf, io as _io
sf.info(_io.BytesIO(audio_bytes))
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid audio file: {e}")
meta = _save_voice(name, audio_bytes, ext, transcript, description)
return {"created": True, "voice": meta}
@router.delete("/v1/audio/voices/{name}")
async def delete_voice(name: str):
"""Delete a saved voice profile."""
import shutil
vdir = _voice_path(name)
if not os.path.exists(vdir):
raise HTTPException(status_code=404, detail=f"Voice '{name}' not found")
shutil.rmtree(vdir)
return {"deleted": True, "name": name}
# ---------------------------------------------------------------------------
# Voice cloning TTS
# ---------------------------------------------------------------------------
class VoiceCloneRequest(BaseModel):
text: str # text to synthesize
voice_name: Optional[str] = None # use a saved voice profile
ref_audio: Optional[str] = None # base64 reference audio (if not using saved voice)
ref_text: Optional[str] = None # transcript of ref_audio
speed: Optional[float] = 1.0
seed: Optional[int] = None
response_format: Optional[str] = "url"
model_config = ConfigDict(extra="allow")
@router.post("/v1/audio/clone")
async def clone_voice(request: VoiceCloneRequest, http_request: Request = None):
"""
Synthesize speech in a cloned voice using F5-TTS.
Provide either:
- voice_name: name of a saved voice profile
- ref_audio (base64) + ref_text: inline reference audio
"""
# Resolve reference audio
ref_audio_path = None
ref_text = request.ref_text or ''
temps = []
try:
if request.voice_name:
meta = _load_voice(request.voice_name)
if not meta:
raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
ref_audio_path = meta['audio_file']
ref_text = ref_text or meta.get('transcript', '')
elif request.ref_audio:
audio_bytes, ext = _decode_audio(request.ref_audio)
tmp = tempfile.NamedTemporaryFile(suffix=ext, delete=False)
tmp.write(audio_bytes)
tmp.close()
ref_audio_path = tmp.name
temps.append(ref_audio_path)
else:
raise HTTPException(status_code=400, detail="Provide voice_name or ref_audio")
if not ref_text:
raise HTTPException(status_code=400, detail="ref_text (transcript of reference audio) is required for voice cloning")
try:
audio_bytes = await asyncio.get_event_loop().run_in_executor(
None, _f5tts_clone,
ref_audio_path, ref_text, request.text,
request.speed or 1.0, request.seed,
)
except ImportError:
raise HTTPException(status_code=501, detail="f5-tts not installed. Run: pip install f5-tts")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Voice cloning failed: {e}")
result = _save_audio_response(audio_bytes, http_request)
return {"created": int(time.time()), "data": [result]}
finally:
for t in temps:
try:
os.unlink(t)
except Exception:
pass
"""
Voice conversion endpoint — converts timbre while preserving pitch, melody and expression.
Unlike TTS-based dubbing, this works correctly for singing and music.
POST /v1/audio/convert — convert voice timbre in audio (speech or singing)
"""
import asyncio
import base64
import io
import os
import tempfile
import time
from typing import Optional
import numpy as np
import soundfile as sf
from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel, ConfigDict
router = APIRouter()
global_args = None
global_file_path = None
_wrapper = None # SeedVCWrapper singleton
def set_global_args(args):
global global_args
global_args = args
def set_global_file_path(path):
global global_file_path
global_file_path = path
def _get_wrapper():
global _wrapper
if _wrapper is None:
from seed_vc.seed_vc_wrapper import SeedVCWrapper
_wrapper = SeedVCWrapper()
return _wrapper
def _decode_audio_to_file(data: str, suffix: str = '.wav') -> str:
if data.startswith('data:'):
_, b64 = data.split(',', 1)
raw = base64.b64decode(b64)
else:
raw = base64.b64decode(data)
tmp = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
tmp.write(raw)
tmp.close()
return tmp.name
def _save_response(audio_np: np.ndarray, sr: int, http_request) -> dict:
buf = io.BytesIO()
sf.write(buf, audio_np, sr, format='WAV')
wav_bytes = buf.getvalue()
import uuid
filename = f'{uuid.uuid4().hex}_converted.wav'
if global_file_path:
os.makedirs(global_file_path, exist_ok=True)
fpath = os.path.join(global_file_path, filename)
with open(fpath, 'wb') as f:
f.write(wav_bytes)
host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
if ':' in host:
parts = host.split(':')
if len(parts) == 2 and parts[1].isdigit():
host = parts[0]
proto = 'https' if getattr(global_args, 'https', False) else 'http'
port = getattr(global_args, 'port', 8000) if global_args else 8000
return {'url': f'{proto}://{host}:{port}/v1/files/{filename}'}
return {'b64_wav': base64.b64encode(wav_bytes).decode()}
class VoiceConvertRequest(BaseModel):
"""
Convert the timbre of source_audio to match target_voice,
while preserving pitch, melody, rhythm and expression.
Use f0_condition=True for singing/music (slower but pitch-accurate).
Use f0_condition=False for speech (faster).
"""
source_audio: str # base64 audio to convert (the performance)
target_voice: Optional[str] = None # base64 reference audio for target timbre
voice_name: Optional[str] = None # saved voice profile name
f0_condition: Optional[bool] = False # True = singing/music mode (preserves pitch)
pitch_shift: Optional[int] = 0 # semitones to shift after conversion
diffusion_steps: Optional[int] = 10 # quality vs speed (10–30)
length_adjust: Optional[float] = 1.0
inference_cfg_rate: Optional[float] = 0.7
response_format: Optional[str] = 'url'
model_config = ConfigDict(extra='allow')
@router.post('/v1/audio/convert')
async def convert_voice(request: VoiceConvertRequest, http_request: Request = None):
"""
Voice conversion: preserves pitch/melody/expression, changes only timbre.
Set f0_condition=True for singing and music.
"""
target_path = None
temps = []
try:
if request.voice_name:
from codai.api.voice_clone import _load_voice
meta = _load_voice(request.voice_name)
if not meta:
raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
target_path = meta['audio_file']
elif request.target_voice:
target_path = _decode_audio_to_file(request.target_voice)
temps.append(target_path)
else:
raise HTTPException(status_code=400, detail='Provide voice_name or target_voice')
source_path = _decode_audio_to_file(request.source_audio)
temps.append(source_path)
try:
wrapper = _get_wrapper()
except ImportError:
raise HTTPException(status_code=501,
detail='seed-vc not installed. Run: pip install seed-vc')
def _run():
return wrapper.convert_voice(
source=source_path,
target=target_path,
diffusion_steps=request.diffusion_steps or 10,
length_adjust=request.length_adjust or 1.0,
inference_cfg_rate=request.inference_cfg_rate or 0.7,
f0_condition=bool(request.f0_condition),
pitch_shift=request.pitch_shift or 0,
stream_output=False,
)
try:
audio_out = await asyncio.get_event_loop().run_in_executor(None, _run)
except Exception as e:
raise HTTPException(status_code=500, detail=f'Voice conversion failed: {e}')
sr = 44100 if request.f0_condition else 22050
if isinstance(audio_out, tuple):
audio_out = audio_out[0]
result = _save_response(np.array(audio_out).flatten(), sr, http_request)
return {'created': int(time.time()), 'data': [result]}
finally:
for t in temps:
try:
os.unlink(t)
except Exception:
pass
......@@ -30,6 +30,7 @@ class ServerConfig:
https: bool = False
https_key_path: Optional[str] = None
https_cert_path: Optional[str] = None
queue_max_size: int = 6
@dataclass
......@@ -128,10 +129,12 @@ class ConfigManager:
self.config_path = self.config_dir / "config.json"
self.models_path = self.config_dir / "models.json"
self.auth_path = self.config_dir / "auth.json"
self.pipelines_path = self.config_dir / "pipelines.json"
self.config: Optional[Config] = None
self.models_data: Dict[str, Any] = {}
self.auth_data: Dict[str, Any] = {}
self.pipelines_data: list = []
def ensure_config_dir(self):
"""Create configuration directory if it doesn't exist."""
......@@ -196,19 +199,12 @@ class ConfigManager:
# Create default auth.json
if not self.auth_path.exists():
try:
from argon2 import PasswordHasher
ph = PasswordHasher()
default_admin_hash = ph.hash("admin")
except ImportError:
from codai.admin.auth import hash_password
default_admin_hash = hash_password("admin")
default_auth = {
"users": [{
"id": 1,
"username": "admin",
"password_hash": default_admin_hash,
"password_hash": hash_password("admin"),
"role": "admin",
"created_at": "2026-05-03T00:00:00Z",
"must_change_password": True
......@@ -219,8 +215,8 @@ class ConfigManager:
with open(self.auth_path, 'w') as f:
json.dump(default_auth, f, indent=2)
print(f"Created default auth config: {self.auth_path}")
print("\nDefault credentials: admin / admin")
print("You will be prompted to change the password on first login.\n")
print(f"\nDefault credentials: admin / admin")
print("IMPORTANT: Change this password immediately after first login.\n")
def load(self) -> Config:
"""Load configuration from files.
......@@ -229,7 +225,6 @@ class ConfigManager:
Config object with loaded settings
"""
# Create defaults if config directory is empty or doesn't exist
if not self.config_dir.exists() or not any(self.config_dir.iterdir()):
self.create_default_configs()
# Load config.json
......@@ -287,6 +282,13 @@ class ConfigManager:
"sessions": {}
}
# Load pipelines.json
if self.pipelines_path.exists():
with open(self.pipelines_path, 'r') as f:
self.pipelines_data = json.load(f)
else:
self.pipelines_data = []
return self.config
def save_config(self):
......@@ -298,7 +300,8 @@ class ConfigManager:
"port": self.config.server.port,
"https": self.config.server.https,
"https_key_path": self.config.server.https_key_path,
"https_cert_path": self.config.server.https_cert_path
"https_cert_path": self.config.server.https_cert_path,
"queue_max_size": self.config.server.queue_max_size,
},
"backend": {
"type": self.config.backend.type,
......@@ -367,6 +370,11 @@ class ConfigManager:
with open(self.auth_path, 'w') as f:
json.dump(self.auth_data, f, indent=2)
def save_pipelines(self):
"""Save pipelines.json to disk."""
with open(self.pipelines_path, 'w') as f:
json.dump(self.pipelines_data, f, indent=2)
def reload(self):
"""Reload all configuration files."""
return self.load()
\ No newline at end of file
......@@ -368,7 +368,21 @@ def main():
audio_models = models_config.get("audio_models", [])
for m in audio_models:
mid = _model_id(m)
if mid:
if not mid:
continue
backend = m.get("backend", "") if isinstance(m, dict) else ""
if backend == "whisper-server":
# Register as a whisper-server instance
cfg = _model_cfg(m, "audio")
multi_model_manager.register_whisper_server(
model_id=mid,
server_path=m.get("server_path", config.whisper.server_path or ""),
model_path=m.get("model_path") or None,
port=int(m.get("port", config.whisper.server_port)),
gpu_device=int(m.get("gpu_device", config.vulkan.device_id)),
config=cfg,
)
else:
multi_model_manager.set_audio_model(mid, config=_model_cfg(m, "audio"))
# Image models
......@@ -446,7 +460,18 @@ def main():
print(f" Loaded: {mid}")
else:
print(f" Warning: {mid} failed to load")
# image/audio/vision/tts pre-loading is handled by their respective
elif mtype == "audio" and mid in multi_model_manager.whisper_servers:
wsm = multi_model_manager.whisper_servers[mid]
result = wsm.start(wsm._model_path, gpu_device=wsm._gpu_device)
if wsm.is_running():
ws_key = f"audio:{mid}"
multi_model_manager.models[ws_key] = wsm
multi_model_manager.active_in_vram = ws_key
multi_model_manager.models_in_vram.add(ws_key)
print(f" whisper-server started: {mid}")
else:
print(f" Warning: whisper-server '{mid}' failed to start")
# image/vision/tts pre-loading is handled by their respective
# API modules on first request; we just log intent here.
else:
print(f" Note: pre-loading for {mtype} models happens on first request")
......@@ -550,6 +575,27 @@ def main():
if global_file_path:
set_audiogen_file_path(global_file_path)
# Set voice clone module global args
from codai.api.voice_clone import set_global_args as set_vc_global_args, set_global_file_path as set_vc_file_path
set_vc_global_args(global_args)
if global_file_path:
set_vc_file_path(global_file_path)
from codai.api.voice_convert import set_global_args as set_vconv_global_args, set_global_file_path as set_vconv_file_path
set_vconv_global_args(global_args)
if global_file_path:
set_vconv_file_path(global_file_path)
# Set faceswap module global args
from codai.api.faceswap import set_global_args as set_fs_global_args, set_global_file_path as set_fs_file_path
set_fs_global_args(global_args)
if global_file_path:
set_fs_file_path(global_file_path)
# Set character profiles module global args
from codai.api.characters import set_global_args as set_chars_global_args
set_chars_global_args(global_args)
# Set embeddings module global args
from codai.api.embeddings import set_global_args as set_embed_global_args
set_embed_global_args(global_args)
......@@ -585,6 +631,10 @@ def main():
# Apply queue max size from config
from codai.queue.manager import queue_manager
queue_manager.max_size = config.server.queue_max_size
# Start the server
import uvicorn
print(f"\nStarting server on http://{config.server.host}:{config.server.port}")
......
......@@ -389,6 +389,11 @@ class WhisperServerManager:
"url": self.base_url
}
def cleanup(self):
"""Stop the subprocess — called by the VRAM eviction/unload machinery."""
print("whisper-server: evicted from VRAM, stopping subprocess")
self.stop()
class MultiModelManager:
"""
......@@ -412,9 +417,11 @@ class MultiModelManager:
self.active_in_vram: Optional[str] = None # most-recently-used model key
self.models_in_vram: set = set() # all models currently in VRAM
self.model_aliases: Dict[str, str] = {}
self.whisper_server: Optional[WhisperServerManager] = None
self.whisper_server: Optional[WhisperServerManager] = None # legacy single-instance compat
self.whisper_servers: Dict[str, WhisperServerManager] = {} # id -> manager
self.model_backend_types: Dict[str, str] = {}
self.tool_breaker = FuzzyToolBreaker(threshold=3) # Circuit breaker for repetitive tool calls
self._load_lock = threading.Lock() # Prevents duplicate on-demand model loads
@property
def image_model(self) -> Optional[str]:
......@@ -432,12 +439,17 @@ class MultiModelManager:
print(f"Warning: Error cleaning up model {key}: {e}")
self.models.clear()
# Cleanup whisper server
if self.whisper_server:
# Cleanup whisper server(s)
for wsm in self.whisper_servers.values():
try:
self.whisper_server.stop()
wsm.stop()
except Exception as e:
print(f"Warning: Error cleaning up whisper server: {e}")
if self.whisper_server and self.whisper_server not in self.whisper_servers.values():
try:
self.whisper_server.stop()
except Exception:
pass
# Clear all model lists
self.default_model = None
......@@ -520,30 +532,30 @@ class MultiModelManager:
print(f"Model '{model_name}' cached as: {resolved_model}")
def _load_default_model(self):
"""Load the default model on demand."""
"""Load the default model on demand (thread-safe)."""
if not self.default_model:
return None
# Check if already loaded
# Fast path: already loaded (checked without lock for performance)
if self.default_model in self.models:
return self.models[self.default_model]
with self._load_lock:
# Re-check inside the lock to avoid duplicate loads from concurrent requests
if self.default_model in self.models:
return self.models[self.default_model]
# Get config and backend type
config = self.config.get(self.default_model, {})
backend_type = self.model_backend_types.get(self.default_model, "auto")
# Get global args for additional parameters
try:
from codai.api.state import get_global_args
global_args = get_global_args()
except:
except Exception:
global_args = None
# Create new model manager and load the model
model_manager = ModelManager()
try:
# Build kwargs from config
kwargs = {}
if 'ctx' in config:
kwargs['ctx'] = config['ctx']
......@@ -569,40 +581,35 @@ class MultiModelManager:
print(f"Loading default model on demand: {self.default_model}")
model_manager.load_model(self.default_model, backend_type=backend_type, **kwargs)
# Add to models dict
self.models[self.default_model] = model_manager
self.current_model_key = self.default_model
print(f"Model loaded successfully: {self.default_model}")
return model_manager
except Exception as e:
print(f"Error loading model {self.default_model}: {e}")
return None
def _load_model_by_name(self, model_name: str):
"""Load a model by name on demand."""
# Check if already loaded
"""Load a model by name on demand (thread-safe)."""
if model_name in self.models:
return self.models[model_name]
with self._load_lock:
# Re-check inside lock to prevent duplicate loads
if model_name in self.models:
return self.models[model_name]
# Check if it's registered in config
config = self.config.get(model_name, {})
backend_type = self.model_backend_types.get(model_name, "auto")
# Get global args for additional parameters
try:
from codai.api.state import get_global_args
global_args = get_global_args()
except:
except Exception:
global_args = None
# Create new model manager and load the model
model_manager = ModelManager()
try:
# Build kwargs from config
kwargs = {}
if 'ctx' in config:
kwargs['ctx'] = config['ctx']
......@@ -628,14 +635,10 @@ class MultiModelManager:
print(f"Loading model on demand: {model_name}")
model_manager.load_model(model_name, backend_type=backend_type, **kwargs)
# Add to models dict
self.models[model_name] = model_manager
self.current_model_key = model_name
print(f"Model loaded successfully: {model_name}")
return model_manager
except Exception as e:
print(f"Error loading model {model_name}: {e}")
return None
......@@ -655,6 +658,25 @@ class MultiModelManager:
self.config[f"audio:{resolved_model}"] = self.config.pop(f"audio:{model_name}")
print(f"Audio model '{model_name}' cached as: {resolved_model}")
def register_whisper_server(self, model_id: str, server_path: str, model_path: str = None,
port: int = 8744, gpu_device: int = 0, config: Dict = None):
"""Register a whisper-server instance as an audio model."""
wsm = WhisperServerManager(server_path=server_path, port=port)
wsm._model_path = model_path
wsm._gpu_device = gpu_device
self.whisper_servers[model_id] = wsm
# Keep legacy single-instance reference pointing to the first one registered
if self.whisper_server is None:
self.whisper_server = wsm
# Register as allowed audio model with its config
cfg = config or {}
cfg.setdefault("load_mode", "on-request")
if model_id not in self.audio_models:
self.audio_models.append(model_id)
self.config[f"audio:{model_id}"] = cfg
print(f"Registered whisper-server audio model: {model_id} (server: {server_path})")
return wsm
def set_tts_model(self, model_name: str, config: Dict = None):
"""Set the text-to-speech model and download/cache it if needed."""
self.tts_model = model_name
......@@ -805,6 +827,43 @@ class MultiModelManager:
return allowed
def get_registered_model_type(self, name: str) -> Optional[str]:
"""
Return the type a model is registered under ("text", "image", "audio",
"tts", "vision", "video", "audio_gen", "embedding"), or None if unknown.
Short-name (filename) matching is used so full paths resolve correctly.
"""
def _matches(registered: str) -> bool:
if name == registered:
return True
n_short = name.split("/")[-1] if "/" in name else name
r_short = registered.split("/")[-1] if "/" in registered else registered
return n_short == r_short
if self.default_model and _matches(self.default_model):
return "text"
for m in self.image_models:
if _matches(m):
return "image"
for m in self.audio_models:
if _matches(m):
return "audio"
if self.tts_model and _matches(self.tts_model):
return "tts"
for m in self.vision_models:
if _matches(m):
return "vision"
for m in self.video_models:
if _matches(m):
return "video"
for m in self.audio_gen_models:
if _matches(m):
return "audio_gen"
for m in self.embedding_models:
if _matches(m):
return "embedding"
return None
def is_allowed_model(self, requested_or_resolved: str, model_type: str = None) -> bool:
"""
Check if a model name (raw request value *or* resolved name) is one of
......@@ -823,6 +882,15 @@ class MultiModelManager:
if not requested_or_resolved:
return False
# If a model_type is specified, reject models registered under a
# different type (e.g. an image GGUF requested via /v1/chat/completions).
if model_type:
registered_type = self.get_registered_model_type(requested_or_resolved)
if registered_type is not None and registered_type != model_type:
# "vision" models are acceptable for "text" endpoints (multimodal)
if not (model_type == "text" and registered_type == "vision"):
return False
# Quick check against the full set of allowed identifiers
allowed = self.get_all_allowed_identifiers()
if requested_or_resolved in allowed:
......@@ -1365,9 +1433,26 @@ class MultiModelManager:
# This prevents API callers from requesting arbitrary models that were not
# specified on the command line (or registered as aliases).
if not self.is_allowed_model(resolved_name, model_type):
# Also try the original requested_model value (before alias resolution)
# in case the caller used a valid alias that resolved to something we
# didn't recognise above (shouldn't happen, but be safe).
# Check if the model exists but is registered under a different type
registered_type = self.get_registered_model_type(resolved_name)
if registered_type is not None and registered_type != model_type:
endpoint_hint = {
"image": "POST /v1/images/generations",
"audio": "POST /v1/audio/transcriptions",
"tts": "POST /v1/audio/speech",
"video": "POST /v1/videos/generations",
}.get(registered_type, f"the {registered_type} endpoint")
print(f"Model type mismatch: '{resolved_name}' is a {registered_type} model, "
f"requested via {model_type} endpoint")
return {
'model_key': None,
'model_name': None,
'model_object': None,
'config': {},
'already_loaded': False,
'error': (f"Model '{resolved_name}' is a {registered_type} model and cannot be used "
f"for {model_type} generation. Use {endpoint_hint} instead."),
}
allowed_ids = sorted(self.get_all_allowed_identifiers())
print(f"Model validation failed: '{resolved_name}' is not an allowed model. "
f"Allowed models: {allowed_ids}")
......
# codai.openai — optional LiteLLM integration layer
# CoderAI - OpenAI-compatible API server
# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""LiteLLM backend wrapper for codai.
This module wraps the litellm library so that the text endpoint can forward
requests to any model provider supported by LiteLLM (Ollama, OpenAI, Anthropic,
etc.) while still returning responses in the standard OpenAI format.
"""
import re
import time
import uuid
from typing import Any, AsyncGenerator, Dict, List, Optional
try:
import litellm
litellm.drop_params = True # silently drop unsupported params
LITELLM_AVAILABLE = True
except ImportError:
LITELLM_AVAILABLE = False
class LiteLLMBackend:
"""Wraps litellm.acompletion with the interface expected by codai's text endpoint."""
def __init__(
self,
model: str,
api_key: str,
api_base: Optional[str],
context_window: int = 8192,
model_manager=None,
):
self.model = model
self.api_key = api_key
self.api_base = api_base
self.context_window = context_window
self.model_manager = model_manager
def _litellm_model(self, model: str) -> str:
"""Return the model string in the format litellm expects."""
if model.startswith('ollama:'):
return model # litellm already understands "ollama/<name>"
return model
async def chat_completion(
self,
messages: List[Dict],
model: str,
temperature: Optional[float] = 1.0,
top_p: Optional[float] = 1.0,
max_tokens: Optional[int] = None,
stop=None,
tools=None,
tool_choice=None,
stream: bool = False,
tool_parser=None,
):
"""Call litellm.acompletion and return either a full response dict or
an async generator of chunk dicts (when stream=True).
"""
if not LITELLM_AVAILABLE:
raise RuntimeError("litellm is not installed. Run: pip install litellm")
kwargs: Dict[str, Any] = {
"model": self._litellm_model(model),
"messages": messages,
"api_key": self.api_key,
"stream": stream,
}
if self.api_base:
kwargs["api_base"] = self.api_base
if temperature is not None:
kwargs["temperature"] = temperature
if top_p is not None:
kwargs["top_p"] = top_p
if max_tokens is not None:
kwargs["max_tokens"] = max_tokens
if stop:
kwargs["stop"] = stop if isinstance(stop, list) else [stop]
if tools:
kwargs["tools"] = tools
if tool_choice:
kwargs["tool_choice"] = tool_choice
if stream:
return self._stream(kwargs)
else:
response = await litellm.acompletion(**kwargs)
return response.model_dump() if hasattr(response, 'model_dump') else dict(response)
async def _stream(self, kwargs: Dict) -> AsyncGenerator[Dict, None]:
response = await litellm.acompletion(**kwargs)
async for chunk in response:
yield chunk.model_dump() if hasattr(chunk, 'model_dump') else dict(chunk)
def get_rate_limit_headers(self, prompt_tokens: int, completion_tokens: int) -> Dict:
return {
"x-ratelimit-limit-requests": "1000",
"x-ratelimit-remaining-requests": "999",
"x-ratelimit-limit-tokens": str(self.context_window),
"x-ratelimit-remaining-tokens": str(
max(0, self.context_window - prompt_tokens - completion_tokens)
),
}
# ------------------------------------------------------------------
# Qwen-specific helpers (tool calls embedded in <tool_call>…</tool_call>)
# ------------------------------------------------------------------
_QWEN_TOOL_PATTERN = re.compile(
r'<tool_call>\s*(\{.*?\})\s*</tool_call>', re.DOTALL
)
_QWEN_TAG_PATTERN = re.compile(
r'<tool_call>.*?</tool_call>', re.DOTALL
)
def parse_qwen_tool_calls(self, content: str) -> List[Dict]:
"""Extract tool calls embedded as <tool_call>{…}</tool_call> tags."""
import json
calls = []
for m in self._QWEN_TOOL_PATTERN.finditer(content):
try:
data = json.loads(m.group(1))
calls.append({
"id": f"call_{uuid.uuid4().hex[:8]}",
"type": "function",
"function": {
"name": data.get("name", ""),
"arguments": json.dumps(data.get("arguments", {})),
},
})
except (json.JSONDecodeError, KeyError):
continue
return calls
def strip_tool_tags(self, content: str) -> str:
"""Remove <tool_call>…</tool_call> blocks from content."""
return self._QWEN_TAG_PATTERN.sub('', content).strip()
def get_litellm_backend(
model: str,
api_key: str,
api_base: Optional[str] = None,
context_window: int = 8192,
model_manager=None,
) -> LiteLLMBackend:
"""Return a LiteLLMBackend instance for the given model."""
return LiteLLMBackend(
model=model,
api_key=api_key,
api_base=api_base,
context_window=context_window,
model_manager=model_manager,
)
......@@ -57,9 +57,14 @@ class VideoGenerationRequest(BaseModel):
camera_motion: Optional[str] = None # zoom-in | zoom-out | pan-left | pan-right | tilt-up | tilt-down | rotate
# ── Character consistency ─────────────────────────────────────────────
character_references: Optional[List[str]] = None # list of base64/URL reference images
# Each entry: {"name": "Alice", "images": ["b64...", ...]}
characters: Optional[List[dict]] = None
# Legacy flat list of base64/URL reference images (still accepted)
character_references: Optional[List[str]] = None
character_strength: Optional[float] = 0.8
character_names: Optional[List[str]] = None # optional names per reference
# Named saved profiles to load (resolved server-side)
character_profiles: Optional[List[str]] = None
# ── Audio generation / manipulation ──────────────────────────────────
add_audio: Optional[bool] = False
......
......@@ -33,6 +33,12 @@ class QueueManager:
self.model_loading: bool = False
self.model_name: Optional[str] = None
self.lock = asyncio.Lock()
self.max_size: int = 6
async def is_full(self) -> bool:
"""Return True if the queue has reached max_size."""
async with self.lock:
return len(self.waiting_requests) >= self.max_size
async def add_waiting(self, request_id: str) -> None:
"""Add a request to the waiting queue."""
......
videogen @ 04778e17
Subproject commit 04778e172a9a83d0778f566045f995828c6c3556
......@@ -32,6 +32,16 @@ realesrgan>=0.3.0
basicsr>=1.4.2
timm>=0.9.0
# Voice cloning (F5-TTS zero-shot voice cloning)
f5-tts>=1.1.0
# Voice conversion / singing voice conversion (Seed-VC — preserves pitch/melody)
seed-vc>=0.4.0
# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
insightface>=0.7.3
onnxruntime-gpu>=1.20.0 # GPU-accelerated ONNX runtime for insightface
# Optional: for better performance with NVIDIA GPUs
bitsandbytes>=0.41.0
sentencepiece>=0.1.99
......
......@@ -18,3 +18,10 @@ huggingface-hub>=0.19.0
# Optional: Audio transcription without PyTorch (whispercpp)
# Note: faster-whisper requires PyTorch, but whispercpp works without it
whispercpp>=0.0.17 # For GGUF-based Whisper transcription without PyTorch
# Voice cloning (F5-TTS zero-shot voice cloning)
f5-tts>=1.1.0
# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
insightface>=0.7.3
onnxruntime>=1.20.0 # CPU ONNX runtime (use onnxruntime-gpu for GPU acceleration)
......@@ -75,6 +75,16 @@ timm>=0.9.0 # vision model backbones (depth/segment endpoints
# pip install audiocraft
# AudioLDM2 is available via diffusers (already listed above)
# Voice cloning (F5-TTS zero-shot voice cloning)
f5-tts>=1.1.0
# Voice conversion / singing voice conversion (Seed-VC — preserves pitch/melody)
seed-vc>=0.4.0
# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
insightface>=0.7.3
onnxruntime-gpu>=1.20.0 # GPU-accelerated ONNX runtime for insightface
# Optional: for better performance
# bitsandbytes>=0.41.0 # for 4-bit/8-bit quantization
# sentencepiece>=0.1.99 # for some tokenizers
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment