wip: snapshot in-progress platform updates

b17e45a5 · Stefy Lanza (nextime / spora ) · 8fd1c5c2 · b17e45a5 · b17e45a5 · b17e45a5
Commit b17e45a5 authored May 06, 2026 by Stefy Lanza (nextime / spora )
32 changed files
--- a/README.md
+++ b/README.md
 # CoderAI

-An OpenAI-compatible API server with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and multi-modal support (text, image, audio, TTS).
+An OpenAI-compatible API server with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support.

 ## Features

 ### Core Capabilities
 - **OpenAI-Compatible API**: Drop-in replacement for OpenAI's API endpoints
- **Web Admin Dashboard**: Modern UI for model management, user authentication, and API tokens
- **Configuration-Based**: JSON config files for all settings - no complex CLI arguments
- **Multi-Modal Support**: Text generation, image generation, audio transcription, text-to-speech
+- **Web Studio**: Modern UI for all generation tasks — chat, image, video, audio, pipelines
+- **Configuration-Based**: JSON config files for all settings — no complex CLI arguments
+- **Multi-Modal**: Text, image, video, audio, TTS, STT, embeddings
 - **Per-Model Configuration**: Individual settings for each model (GPU layers, quantization, context size)
 - **On-Demand Loading**: Models load automatically when requested, unload when idle

@@ -19,6 +19,48 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
 - **Auto-Detection**: Automatically selects best available backend
 - **Multi-GPU**: Automatic distribution across multiple devices

+### Image Generation
+- **Text-to-Image**: Stable Diffusion, SDXL, Flux, and GGUF image models (via stable-diffusion.cpp)
+- **Image-to-Image**: Style transfer and image editing
+- **Inpainting**: Fill masked regions with AI-generated content
+- **Upscaling**: Real-ESRGAN super-resolution (2×/4×/8×)
+- **Deblur**: Wiener deconvolution + unsharp masking
+- **Unpixelate**: Real-ESRGAN restoration of pixelated/compressed images
+- **Outfit Change**: Auto-generated clothing mask + inpainting for wardrobe changes
+- **Face Swap**: InsightFace INSwapper — swap faces in images and videos
+- **Depth Estimation**: Monocular depth maps
+- **Segmentation**: SAM-based object segmentation
+
+### Video Generation
+- **Text-to-Video**: Generate video from text prompts
+- **Image-to-Video**: Animate a still image
+- **Video-to-Video**: Transform existing video
+- **Ti2V**: Text + image → video with camera motion control
+- **Frame Interpolation**: Increase FPS via RIFE or ffmpeg minterpolate
+- **Upscaling**: Real-ESRGAN video upscaling
+- **Subtitles**: Whisper transcription + optional translation + burn-in
+- **Dubbing**: Transcribe → translate → TTS → replace audio track
+
+### Audio
+- **Text-to-Speech**: Kokoro TTS with voice selection and speed control
+- **Speech-to-Text**: Whisper transcription (faster-whisper / whispercpp)
+- **Music/SFX Generation**: MusicGen, AudioGen, AudioLDM2
+- **Voice Cloning**: F5-TTS zero-shot voice cloning from a reference audio clip
+- **Voice Conversion (SVC)**: Seed-VC — converts timbre while preserving pitch, melody and expression; **singing mode** for music
+- **Voice Profiles**: Save named voice profiles (reference audio + transcript) for reuse
+
+### Pipelines
+Built-in multi-step pipelines callable from the API or web UI:
+
+| Endpoint | Description |
+|---|---|
+| `POST /v1/pipelines/image-to-video` | Generate image → animate → optional audio |
+| `POST /v1/pipelines/video-dub` | Transcribe → translate → TTS dub → burn subtitles |
+| `POST /v1/pipelines/story` | LLM script → images per scene → video → TTS narration |
+| `POST /v1/pipelines/audio-dub` | Transcribe audio/video → translate → clone voice → replace audio |
+
+**Custom Pipeline Builder**: Create, save and run your own multi-step pipelines from the web UI or API. Chain any combination of 18 step types with `{{input}}` and `{{stepN.output}}` template variables.
+
 ### Advanced Features
 - **Memory Management**: Smart VRAM → RAM → Disk offloading (NVIDIA)
 - **Quantization**: 4-bit/8-bit via bitsandbytes (NVIDIA) or GGUF quantization (Vulkan)
@@ -26,6 +68,9 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
 - **Streaming**: Server-sent events for real-time token generation
 - **Tool Calling**: Function calling and tool use support
 - **Authentication**: Session-based auth with API token support
+- **Webcam/Microphone**: Capture directly from browser for face swap and voice cloning
+
+---

 ## Installation

@@ -36,974 +81,322 @@ An OpenAI-compatible API server with web administration dashboard, supporting mu
 - For AMD/Intel GPUs (Vulkan): Vulkan drivers and SDK
 - For CPU-only: No additional requirements

-**Note**: The Vulkan backend works with:
- AMD GPUs (RX 400 series and newer) - **Recommended**
- Intel integrated GPUs (HD 600 series and newer) and Intel Arc GPUs
- NVIDIA GPUs (GTX 900 series and newer) - *CUDA backend preferred*
-
-Any GPU with Vulkan 1.2+ driver support should work with the Vulkan backend.
-
 ### Quick Install with Build Script

-The easiest way to install is using the provided build script:
-
 ```bash
-# Clone the repository
 git clone git@git.nexlab.net:nexlab/coderai.git
 cd coderai

-# Install all backends (recommended)
-./build.sh all
-
-# Or install specific backend:
-./build.sh nvidia   # NVIDIA GPUs only
-./build.sh vulkan   # AMD/Intel GPUs only
+./build.sh all      # All backends (recommended)
+./build.sh nvidia   # NVIDIA only
+./build.sh vulkan   # AMD/Intel only
 ```

-**Note**: The `all` option installs support for all backends, allowing you to switch between them via configuration. The `vulkan` option works for both AMD and Intel GPUs.
-
-The build script will:
- Create a virtual environment
- Install the appropriate dependencies for your GPU
- Set up the correct backend(s)
+The build script creates a virtual environment, installs dependencies, and builds GPU-accelerated backends including `stable-diffusion-cpp-python` with CUDA+Vulkan support.

 ### Manual Installation

-If you prefer manual installation:
-
 ```bash
-# Create virtual environment
 python -m venv venv
 source venv/bin/activate

-# For NVIDIA GPUs
+# NVIDIA
 pip install torch torchvision torchaudio
 pip install -r requirements-nvidia.txt

-# For AMD GPUs with Vulkan
+# AMD/Intel (Vulkan)
 CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir
 pip install -r requirements-vulkan.txt
 ```

-### Platform-Specific Requirements
-
-#### NVIDIA (CUDA)
-
-Requires:
- NVIDIA GPU with CUDA support
- CUDA toolkit (11.8+ or 12.1+)
- PyTorch with CUDA
-
-Models: HuggingFace format (safetensors/pytorch)
-
-#### AMD and Intel (Vulkan)
-
-Requires:
- GPU with Vulkan 1.2+ support:
-  - AMD: RX 400 series and newer (recommended)
-  - Intel: HD 600 series integrated graphics or newer, Intel Arc GPUs
-  - NVIDIA: GTX 900 series and newer (but CUDA backend preferred)
- Vulkan drivers and SDK
-
-**Install Vulkan drivers and tools:**
-```bash
-# Debian/Ubuntu
-sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslc glslang-tools glslang-dev
-
-# Fedora
-sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers glslang
-
-# Arch Linux
-sudo pacman -S vulkan-headers vulkan-icd-loader vulkan-radeon glslang
-```
-
-**Note:** The shader compiler `glslc` is required to build llama-cpp-python with Vulkan support. On Debian/Ubuntu, it's provided by the `glslc` package. If `glslc` is not found after installing, try:
+### Stable Diffusion GGUF (CUDA + Vulkan)

 ```bash
-# Check if glslc exists somewhere
-find /usr -name "glslc" 2>/dev/null
-
-# If found in a non-standard location, add to PATH
-export PATH=$PATH:/usr/lib/shaderc/bin
-
-# Or create a symlink if glslangValidator exists
-sudo ln -s $(which glslangValidator) /usr/local/bin/glslc
+CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
+  pip install stable-diffusion-cpp-python --no-cache-dir --force-reinstall
 ```

-Models: GGUF format (from HuggingFace or local files)
-
-**Note**: The Vulkan backend uses llama-cpp-python with GGUF models, which provides excellent performance on AMD and Intel GPUs without requiring vendor-specific SDKs (ROCm/OneAPI).
-
-### Optional Dependencies
-
-#### bitsandbytes (Quantization)
-
-For 4-bit and 8-bit quantization support (reduces VRAM requirements):
+### Voice Cloning and Voice Conversion

 ```bash
-# CUDA
-pip install "bitsandbytes>=0.41.0"
-
-# ROCm support may require building from source
-# See: https://github.com/TimDettmers/bitsandbytes
+pip install f5-tts    # Voice cloning (F5-TTS)
+pip install seed-vc   # Voice conversion / singing SVC
 ```

-#### Flash Attention 2
-
-For significantly faster inference on supported GPUs (requires specific CUDA/ROCm versions):
+### Face Swap

 ```bash
-# Requires CUDA 11.6+ or ROCm 5.4+
-pip install flash-attn --no-build-isolation
+pip install insightface onnxruntime-gpu
+# inswapper_128.onnx downloads automatically on first use
 ```

-**Note**: Flash Attention 2 requires:
- CUDA 11.6+ or ROCm 5.4+
- Linux OS (Windows support is experimental)
- Specific GPU architectures (Ampere, Ada Lovelace, Hopper for NVIDIA)
+---

 ## Usage

-### Quick Start
-
 ```bash
-# Activate the virtual environment
-source venv_all/bin/activate  # or venv/bin/activate
-
-# Start the server (uses default config at ~/.coderai/)
-python coderai
+source venv_all/bin/activate

-# Or specify a custom config directory
-python coderai --config /path/to/config
-
-# Enable debug mode for troubleshooting
-python coderai --debug
+python coderai                          # Default config at ~/.coderai/
+python coderai --config /path/to/cfg   # Custom config directory
+python coderai --debug                 # Debug mode
 ```

-The server will start on `http://0.0.0.0:8000` by default.
+Server starts on `http://0.0.0.0:8000`.

 ### Access Points

- **Admin Dashboard**: http://localhost:8000/admin
- **Chat Interface**: http://localhost:8000/chat
- **API Endpoints**: http://localhost:8000/v1/*
- **API Documentation**: http://localhost:8000/docs
+| URL | Description |
+|---|---|
+| `http://localhost:8000/admin` | Admin dashboard |
+| `http://localhost:8000/chat` | Web Studio (generation UI) |
+| `http://localhost:8000/v1/*` | OpenAI-compatible API |
+| `http://localhost:8000/docs` | Interactive API docs |

-### First Login
+Default credentials: `admin` / `admin` (prompted to change on first login).

-Default credentials (you'll be prompted to change the password):
- **Username**: `admin`
- **Password**: `admin`
+---

-### Configuration Files
+## Configuration

-CoderAI uses JSON configuration files stored in `~/.coderai/` (or custom directory via `--config`):
+Config files live in `~/.coderai/` (or `--config` path):

 ```
 ~/.coderai/
-├── config.json       # Server, backend, and global settings
-├── models.json       # Model registry and per-model configurations
-├── auth.json         # Users, API tokens, and sessions
+├── config.json      # Server, backend, global settings
+├── models.json      # Model registry and per-model config
+├── auth.json        # Users, API tokens, sessions
+├── pipelines.json   # Custom pipeline definitions
 └── secret_key       # Session signing key (auto-generated)
 ```

-These files are automatically created with sensible defaults on first run.
-
-### Command-Line Options
-
-```
-usage: coderai [-h] [--config CONFIG] [--debug] [--dump]
-               [--list-cached-models] [--remove-all-models]
-               [--remove-model REMOVE_MODEL] [--download-model DOWNLOAD_MODEL]
-               [--download-file-pattern DOWNLOAD_FILE_PATTERN]
-               [--vulkan-list-devices]
-
-OpenAI-compatible API server supporting NVIDIA (CUDA) and Vulkan backends
-
-options:
-  -h, --help            show this help message and exit
-  --config CONFIG       Configuration directory (default: ~/.coderai/)
-  --debug               Enable debug mode - dumps full request/response to stdout
-  --dump                Dump model output: raw output, parsed output, and debug info
-  --list-cached-models  List all cached models in the model cache directory
-  --remove-all-models   Remove all cached models from the model cache directory
-  --remove-model NAME   Remove a specific cached model by name or hash
-  --download-model ID   Download a model to cache (URL or HuggingFace model ID)
-  --download-file-pattern PATTERN
-                        File pattern for HuggingFace downloads (e.g., .gguf, .safetensors)
-  --vulkan-list-devices List available Vulkan GPU devices and exit
-```
-
-## API Documentation
-
-The API is compatible with OpenAI's REST API. Interactive documentation is available at `http://localhost:8000/docs` when the server is running.
-
-### Endpoints
-
-| Endpoint | Description |
-|----------|-------------|
-| `GET /v1/models` | List available models |
-| `POST /v1/chat/completions` | Chat completions (ChatGPT-style) |
-| `POST /v1/completions` | Text completions (GPT-style) |
-
-### Example curl Commands
-
-#### List Models
-
-```bash
-curl http://localhost:8000/v1/models
-```
-
-#### Chat Completion (Non-Streaming)
-
-```bash
-curl -X POST http://localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "microsoft/DialoGPT-medium",
-    "messages": [
-      {"role": "system", "content": "You are a helpful assistant."},
-      {"role": "user", "content": "Hello, how are you?"}
-    ],
-    "temperature": 0.7,
-    "max_tokens": 150
-  }'
-```
-
-#### Chat Completion (Streaming)
-
-```bash
-curl -X POST http://localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "microsoft/DialoGPT-medium",
-    "messages": [
-      {"role": "user", "content": "Tell me a story"}
-    ],
-    "stream": true,
-    "max_tokens": 200
-  }'
-```
-
-#### Text Completion
-
-```bash
-curl -X POST http://localhost:8000/v1/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "microsoft/DialoGPT-medium",
-    "prompt": "Once upon a time",
-    "max_tokens": 100,
-    "temperature": 0.8
-  }'
-```
-
-#### Chat Completion with Tools
-
-```bash
-curl -X POST http://localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "microsoft/DialoGPT-medium",
-    "messages": [
-      {"role": "user", "content": "What is the weather in Paris?"}
-    ],
-    "tools": [
-      {
-        "type": "function",
-        "function": {
-          "name": "get_weather",
-          "description": "Get the weather for a location",
-          "parameters": {
-            "type": "object",
-            "properties": {
-              "location": {"type": "string"}
-            },
-            "required": ["location"]
-          }
-        }
-      }
-    ]
-  }'
-```
-
-## Configuration
-
-### Configuration Files
-
-All settings are managed through JSON files in the configuration directory (`~/.coderai/` by default):
-
-#### config.json - Server and Backend Settings
+### config.json

 ```json
 {
-  "server": {
-    "host": "0.0.0.0",
-    "port": 8000,
-    "https": false,
-    "https_key_path": null,
-    "https_cert_path": null
-  },
-  "backend": {
-    "type": "auto",
-    "image_backend": "auto",
-    "audio_backend": "auto",
-    "tts_backend": "auto"
-  },
-  "models": {
-    "default_load_mode": "ondemand",
-    "hf_cache_dir": null,
-    "gguf_cache_dir": null
-  },
-  "offload": {
-    "directory": "./offload",
-    "strategy": "auto",
-    "max_gpu_percent": null,
-    "no_ram": false,
-    "load_in_4bit": false,
-    "load_in_8bit": false,
-    "manual_ram_gb": null,
-    "flash_attention": false
-  },
-  "vulkan": {
-    "n_gpu_layers": -1,
-    "n_ctx": 2048,
-    "device_id": 0,
-    "single_gpu": false
-  },
-  "image": {
-    "steps": 4,
-    "width": 512,
-    "height": 512,
-    "cfg_scale": 1.0,
-    "precision": "f32",
-    "cpu_offload": false
-  },
-  "whisper": {
-    "server_path": null,
-    "server_port": 8744
-  }
+  "server": { "host": "0.0.0.0", "port": 8000 },
+  "backend": { "type": "auto" },
+  "models": { "default_load_mode": "ondemand" },
+  "offload": { "load_in_4bit": false, "flash_attention": false },
+  "vulkan": { "n_gpu_layers": -1, "n_ctx": 2048, "device_id": 0 }
 }
 ```

-#### models.json - Model Registry
+### models.json

 ```json
 {
-  "text_models": [
-    {
-      "id": "microsoft/DialoGPT-medium",
-      "backend": "nvidia",
-      "context_size": 2048,
-      "n_gpu_layers": -1,
-      "load_in_4bit": false,
-      "load_in_8bit": false,
-      "flash_attention": false,
-      "enabled": true
-    },
-    {
-      "id": "phi-3-mini-4k-instruct-q4_k_m.gguf",
-      "backend": "vulkan",
-      "context_size": 4096,
-      "n_gpu_layers": -1,
-      "enabled": true
-    }
-  ],
-  "image_models": [
-    {
-      "id": "stable-diffusion-xl-base-1.0",
-      "backend": "nvidia",
-      "steps": 4,
-      "width": 512,
-      "height": 512,
-      "cfg_scale": 1.0,
-      "enabled": true
-    }
-  ],
+  "text_models":  [{ "id": "Qwen/Qwen3.5-9B", "backend": "nvidia", "enabled": true }],
+  "image_models": [{ "id": "z_image_turbo-Q2_K.gguf", "backend": "auto", "enabled": true }],
+  "tts_models":   [{ "id": "kokoro-v1.0.onnx", "enabled": true }],
  "audio_models": [],
-  "vision_models": [],
-  "tts_models": [],
-  "loaded": [],
-  "preload": [],
-  "aliases": {
-    "default": "microsoft/DialoGPT-medium"
-  }
-}
-```
-
-#### auth.json - Users and API Tokens
-
-```json
-{
-  "users": [
-    {
-      "id": "admin",
-      "username": "admin",
-      "password_hash": "$argon2id$...",
-      "role": "admin",
-      "created_at": "2026-05-05T00:00:00Z"
-    }
-  ],
-  "tokens": [
-    {
-      "id": "tok_abc123",
-      "token": "sk-coderai-abc123...",
-      "name": "Production API",
-      "created_at": "2026-05-05T00:00:00Z",
-      "last_used": null
-    }
-  ],
-  "sessions": {}
+  "video_models": []
 }
 ```

-### Managing Configuration
-
-#### Via Web Dashboard
-
-The easiest way to manage configuration is through the web dashboard at `http://localhost:8000/admin`:
-
- **Models**: Add, remove, enable/disable models; configure per-model settings
- **Users**: Create users, change passwords, manage roles
- **Tokens**: Generate API tokens for programmatic access
- **Settings**: Adjust server, backend, and global settings
-
-#### Via Configuration Files
-
-You can also edit the JSON files directly. Changes take effect after restarting the server or using the reload endpoint:
-
-```bash
-curl -X POST http://localhost:8000/admin/api/system/reload
-```
-
-### Per-Model Configuration
-
-Each model can have its own settings that override global defaults:
-
-**Text Models (NVIDIA backend):**
- `backend`: "nvidia" or "vulkan"
- `context_size`: Context window size
- `n_gpu_layers`: Number of layers on GPU (-1 = all)
- `load_in_4bit`: Enable 4-bit quantization
- `load_in_8bit`: Enable 8-bit quantization
- `flash_attention`: Enable Flash Attention 2
-
-**Text Models (Vulkan backend):**
- `backend`: "vulkan"
- `context_size`: Context window size
- `n_gpu_layers`: Number of layers on GPU (-1 = all)
-
-**Image Models:**
- `backend`: "nvidia" or "vulkan"
- `steps`: Number of diffusion steps
- `width`: Image width
- `height`: Image height
- `cfg_scale`: Classifier-free guidance scale
- `precision`: "f32" or "f16"
-
-### Backend Selection
+---

-Backends can be configured globally in `config.json` or per-model in `models.json`:
+## API Reference

- **`auto`**: Automatically detect and use best available backend
- **`nvidia`**: Use CUDA backend (PyTorch + Transformers)
- **`vulkan`**: Use Vulkan backend (llama-cpp-python)
+### Text

-### Model Loading Modes
+| Endpoint | Description |
+|---|---|
+| `GET /v1/models` | List available models |
+| `POST /v1/chat/completions` | Chat completions (streaming supported) |
+| `POST /v1/completions` | Text completions |
+| `POST /v1/embeddings` | Text embeddings |

-Configure in `config.json` under `models.default_load_mode`:
+### Image

- **`ondemand`** (default): Load models when first requested, unload when idle
- **`preload`**: Load models listed in `models.json` → `preload` array at startup
- **`lazy`**: Never preload, always load on-demand
+| Endpoint | Description |
+|---|---|
+| `POST /v1/images/generations` | Text-to-image |
+| `POST /v1/images/edits` | Image-to-image |
+| `POST /v1/images/inpaint` | Inpainting |
+| `POST /v1/images/upscale` | Real-ESRGAN upscaling |
+| `POST /v1/images/deblur` | Deblur / sharpen |
+| `POST /v1/images/unpixelate` | Remove pixelation |
+| `POST /v1/images/outfit` | Change clothing/outfit |
+| `POST /v1/images/faceswap` | Face swap (image or video) |
+| `POST /v1/images/depth` | Depth estimation |
+| `POST /v1/images/segment` | Object segmentation |
+
+### Video

-## Backend-Specific Setup
+| Endpoint | Description |
+|---|---|
+| `POST /v1/video/generations` | Generate video (t2v/i2v/v2v/ti2v/interp) |
+| `POST /v1/video/upscale` | Upscale video |
+| `POST /v1/video/subtitle` | Generate/burn subtitles |
+| `POST /v1/video/interpolate` | Frame interpolation |
+| `POST /v1/video/dub` | Dub video to another language |

-### NVIDIA (CUDA)
+### Audio

-```bash
-# Using build script
-./build.sh nvidia
+| Endpoint | Description |
+|---|---|
+| `POST /v1/audio/speech` | Text-to-speech |
+| `POST /v1/audio/transcriptions` | Speech-to-text (Whisper) |
+| `POST /v1/audio/generate` | Music/SFX generation |
+| `POST /v1/audio/clone` | Voice cloning TTS (F5-TTS) |
+| `POST /v1/audio/convert` | Voice conversion / SVC (Seed-VC) |
+| `GET /v1/audio/voices` | List saved voice profiles |
+| `POST /v1/audio/voices` | Save a voice profile |
+| `DELETE /v1/audio/voices/{name}` | Delete a voice profile |
+
+### Pipelines

-# Or manually install CUDA-enabled PyTorch
-pip install "torch>=2.0.0" "torchvision>=0.15.0" "torchaudio>=2.0.0"
-pip install -r requirements-nvidia.txt
-```
+| Endpoint | Description |
+|---|---|
+| `POST /v1/pipelines/image-to-video` | Image gen → video animation |
+| `POST /v1/pipelines/video-dub` | Full video dubbing pipeline |
+| `POST /v1/pipelines/story` | LLM → images → video → TTS |
+| `POST /v1/pipelines/audio-dub` | Audio/video dub with voice cloning |
+| `GET /v1/pipelines/custom` | List custom pipelines |
+| `POST /v1/pipelines/custom` | Create custom pipeline |
+| `PUT /v1/pipelines/custom/{id}` | Update custom pipeline |
+| `DELETE /v1/pipelines/custom/{id}` | Delete custom pipeline |
+| `POST /v1/pipelines/custom/{id}/run` | Run a saved custom pipeline |
+| `POST /v1/pipelines/run` | Run an inline pipeline definition |
+| `GET /v1/pipelines/step-types` | List available step types |
+
+### Custom Pipeline Definition

-**Configuration in models.json:**
 ```json
 {
-  "text_models": [
+  "name": "My Pipeline",
+  "steps": [
    {
-      "id": "meta-llama/Llama-2-7b-chat-hf",
-      "backend": "nvidia",
-      "context_size": 4096,
-      "n_gpu_layers": -1,
-      "load_in_4bit": false,
-      "load_in_8bit": false,
-      "flash_attention": false,
-      "enabled": true
+      "type": "text_gen",
+      "label": "Write scene description",
+      "params": {
+        "model": "Qwen/Qwen3.5-9B",
+        "prompt": "Describe a visual scene for: {{input}}"
      }
-  ]
-}
-```
-
-### AMD and Intel (Vulkan)
-
-```bash
-# Install Vulkan drivers first
-# Debian/Ubuntu (AMD and Intel):
-sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers intel-media-va-driver
-
-# Fedora:
-sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers intel-gpu-tools
-
-# Using build script
-./build.sh vulkan
-
-# List available Vulkan GPU devices
-python coderai --vulkan-list-devices
-```
-
-**Vulkan Backend Notes:**
- Uses GGUF format models (much smaller than full HuggingFace models)
- Q4_K_M quantization recommended for 4GB+ VRAM GPUs
- Q5_K_M or Q6_K for higher quality
- Works on:
-  - AMD RX 400 series and newer (**recommended**)
-  - Intel integrated graphics (HD 600 series+) and Intel Arc GPUs
-  - NVIDIA GTX 900 series and newer (but CUDA backend is preferred)
- Any GPU with Vulkan 1.2+ driver support should work
- **Update llama-cpp-python** for newer model support: `pip install --upgrade llama-cpp-python --no-cache-dir`
-
-**Intel GPU Specific Notes:**
- Intel integrated GPUs have limited VRAM (shared with system RAM), so use smaller models
- Recommended for Intel iGPUs: `Q4_K_M` quantized models under 2GB file size
- Intel Arc GPUs work well with the same settings as AMD GPUs
-
-**Configuration in models.json:**
-```json
-{
-  "text_models": [
+    },
+    {
+      "type": "image_gen",
+      "params": {
+        "model": "z_image_turbo-Q2_K.gguf",
+        "prompt": "{{step0.output}}"
+      }
+    },
    {
-      "id": "phi-3-mini-4k-instruct-q4_k_m.gguf",
-      "backend": "vulkan",
-      "context_size": 4096,
-      "n_gpu_layers": -1,
-      "enabled": true
+      "type": "video_gen",
+      "params": {
+        "model": "wan-model",
+        "mode": "i2v",
+        "init_image": "{{step1.url}}"
+      }
    }
  ]
 }
 ```

-**Vulkan Configuration in config.json:**
-```json
-{
-  "vulkan": {
-    "n_gpu_layers": -1,
-    "n_ctx": 2048,
-    "device_id": 0,
-    "single_gpu": false
-  }
-}
-```
+Template variables: `{{input}}`, `{{stepN.output}}`, `{{stepN.url}}`.

-### CPU-Only
+Available step types: `text_gen`, `image_gen`, `image_edit`, `image_inpaint`, `image_upscale`, `image_deblur`, `image_unpix`, `image_outfit`, `image_faceswap`, `video_gen`, `video_upscale`, `video_sub`, `video_interp`, `video_dub`, `tts`, `audio_gen`, `voice_clone`, `voice_convert`.

-While not recommended for performance, you can run on CPU:
+---

-```bash
-# NVIDIA backend on CPU
-pip install "torch>=2.0.0" --index-url https://download.pytorch.org/whl/cpu
-pip install -r requirements-nvidia.txt
+## Backend-Specific Notes

-# Or Vulkan backend on CPU (llama-cpp supports CPU fallback)
-CMAKE_ARGS="-DGGML_VULKAN=OFF" pip install llama-cpp-python
-```
+### NVIDIA (CUDA)

-Configure in `config.json`:
-```json
-{
-  "backend": {
-    "type": "nvidia"
-  },
-  "vulkan": {
-    "n_gpu_layers": 0
-  }
-}
-```
+- HuggingFace format models (safetensors/pytorch)
+- GGUF text models via llama-cpp-python with CUDA
+- Stable Diffusion GGUF via stable-diffusion.cpp with CUDA
+- Optional: bitsandbytes (4-bit/8-bit quantization), Flash Attention 2

-### ROCm Alternative (deprecated)
+### AMD / Intel (Vulkan)

-While the Vulkan backend is now recommended for AMD GPUs, ROCm support is still available through the NVIDIA backend if you have ROCm-enabled PyTorch installed.
+- GGUF format models via llama-cpp-python with Vulkan
+- Stable Diffusion GGUF via stable-diffusion.cpp with Vulkan
+- No ROCm/OneAPI required
+- Intel iGPUs: use Q4_K_M models under 2GB

-### Low VRAM Configuration
+### Multi-GPU (NVIDIA + AMD)

-For GPUs with limited VRAM (4-8GB), configure in `config.json` or per-model in `models.json`:
+To force Vulkan to use only the AMD GPU:

-**Global configuration (config.json):**
 ```json
-{
-  "offload": {
-    "load_in_4bit": true,
-    "directory": "/path/to/fast/storage"
-  }
-}
+{ "vulkan": { "device_id": 1, "single_gpu": true } }
 ```

-**Per-model configuration (models.json):**
+### Low VRAM
+
 ```json
-{
-  "text_models": [
-    {
-      "id": "meta-llama/Llama-2-7b-chat-hf",
-      "backend": "nvidia",
-      "load_in_4bit": true,
-      "enabled": true
-    }
-  ]
-}
+{ "offload": { "load_in_4bit": true } }
 ```

-### Using Vulkan with Multiple GPUs (NVIDIA + AMD)
+---

-If your system has both NVIDIA and AMD GPUs, llama.cpp's Vulkan backend will automatically distribute layers across all visible GPUs for performance. To force Vulkan to use **only** the AMD GPU and prevent VRAM allocation on the NVIDIA GPU, configure in `config.json`:
+## Troubleshooting

-**Configuration in config.json:**
-```json
-{
-  "vulkan": {
-    "device_id": 1,
-    "single_gpu": true
-  }
-}
-```
+### numpy ABI mismatch after installing new packages

-**Alternative: Environment variables**
 ```bash
-# List available Vulkan devices first
-python coderai --vulkan-list-devices
-
-# Then use VK_DEVICE_SELECT_DEVICE to force a specific device
-# For example, if device 1 is your AMD GPU:
-VK_DEVICE_SELECT_DEVICE=1 python coderai
-
-# Or hide NVIDIA GPU from CUDA (prevents any CUDA usage)
-CUDA_VISIBLE_DEVICES="" python coderai
+pip install --force-reinstall --no-cache-dir --no-deps realesrgan insightface
 ```

-**Understanding the Issue:**
-When you have multiple Vulkan-compatible GPUs, llama.cpp automatically distributes model layers across them (shown in logs as "layer X assigned to device VulkanY"). The `single_gpu: true` setting prevents this by using the `tensor_split` parameter with a value of `[0.0, 1.0]` (or similar depending on device count), which tells llama.cpp to put 0% of layers on some GPUs and 100% on the selected GPU.
+### stable-diffusion.cpp: "get sd version from file failed"

-**Notes:**
- The `device_id` setting maps to `main_gpu` in llama-cpp-python
- The `single_gpu` flag builds a `tensor_split` array to force single GPU usage
- Vulkan enumerates all GPUs in your system, so device IDs may differ from CUDA device IDs
- The `vulkaninfo` command shows all GPUs visible to Vulkan
-
-### Multi-GPU Setup
-
-Multiple GPUs are automatically detected and utilized. The model will be distributed across available devices based on memory availability.
+The model architecture is not recognized. Update stable-diffusion-cpp-python:

 ```bash
-# Set visible GPUs (optional)
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-
-# Run - model will be distributed across all visible GPUs
-python coderai
+CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
+  pip install stable-diffusion-cpp-python --upgrade --no-cache-dir
 ```

-## Model Recommendations
-
-### NVIDIA Backend (HuggingFace Models)
-
-#### Small Models (For Testing)
-
- `microsoft/DialoGPT-medium` (~345M parameters)
- `TinyLlama/TinyLlama-1.1B-Chat-v1.0` (~1.1B parameters)
- `facebook/blenderbot-400M-distill` (~400M parameters)
-
-#### Medium Models (4-8GB VRAM with 4-bit)
-
- `meta-llama/Llama-2-7b-chat-hf` (~7B parameters)
- `mistralai/Mistral-7B-Instruct-v0.2` (~7B parameters)
- `HuggingFaceH4/zephyr-7b-beta` (~7B parameters)
-
-#### Large Models (Multiple GPUs or High VRAM)
-
- `meta-llama/Llama-2-13b-chat-hf` (~13B parameters)
- `meta-llama/Llama-2-70b-chat-hf` (~70B parameters) - requires multiple GPUs or disk offload
- `bigscience/bloom-7b1` (~7B parameters)
-
-### Vulkan Backend (GGUF Models)
-
-#### Small Models (2-4GB VRAM)
+### stable-diffusion.cpp using CPU instead of GPU

- `TheBloke/phi-2-GGUF` - phi-2.Q4_K_M.gguf (~1.6B parameters, ~1GB file)
- `TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` - tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+Reinstall with GPU flags:

-#### Medium Models (4-8GB VRAM)
-
- `TheBloke/Llama-2-7B-GGUF` - llama-2-7b.Q4_K_M.gguf (~4GB file)
- `TheBloke/Mistral-7B-Instruct-v0.2-GGUF` - mistral-7b-instruct-v0.2.Q4_K_M.gguf
- `microsoft/Phi-3-mini-4k-instruct-gguf` - Phi-3-mini-4k-instruct-q4.gguf
-
-#### Large Models (8GB+ VRAM)
-
- `TheBloke/Llama-2-13B-GGUF` - llama-2-13b.Q4_K_M.gguf (~7.5GB file)
- `TheBloke/deepseek-coder-6.7B-base-GGUF` - deepseek-coder-6.7b-base.Q4_K_M.gguf
-
-**GGUF Quantization Guide:**
- `Q4_K_M` - Best balance of speed/quality (recommended)
- `Q5_K_M` - Higher quality, slightly slower
- `Q6_K` - Near-unquantized quality
- `Q8_0` - Maximum quality, largest size
-
-**Download Example:**
 ```bash
-# Using huggingface-cli
-huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_M.gguf --local-dir ./models
-
-# Or let coderai download automatically
-python coderai --model TheBloke/Llama-2-7B-GGUF --backend vulkan
+CMAKE_ARGS="-DSD_WEBM=OFF -DSD_CUDA=ON -DSD_VULKAN=ON" \
+  pip install stable-diffusion-cpp-python --no-cache-dir --force-reinstall
 ```

-## Troubleshooting
-
-### Shell Redirection Error: "No such file or directory: '0.0'"
-
-**Problem**: Running `pip install torch>=2.0.0` fails with an error about file "0.0" or "=2.0.0" not found.
-
-**Cause**: The shell interprets `>` as output redirection. The command creates a file named "=2.0.0" and installs an unversioned torch package.
-
-**Solutions**:
-1. **Use quotes** (recommended): `pip install "torch>=2.0.0"`
-2. **Use exact versions**: `pip install torch==2.0.0`
-3. **Use requirements.txt**: Add exact versions to requirements.txt and run `pip install -r requirements.txt`
-
-### Out of Memory Errors
-
-**Problem**: `CUDA out of memory` or system RAM exhausted
-
-**Solutions**:
-1. Use quantization: `--load-in-4bit` or `--load-in-8bit`
-2. Enable disk offload: `--offload-dir /path/to/storage`
-3. Use a smaller model
-4. Reduce batch size in client requests
-
-### Flash Attention Installation Fails
-
-**Problem**: `pip install flash-attn` fails to build
-
-**Solutions**:
-1. Ensure CUDA/ROCm is properly installed
-2. Install build dependencies: `pip install packaging ninja`
-3. Try without build isolation: `pip install flash-attn --no-build-isolation`
-4. Check GPU compatibility (Ampere, Ada Lovelace, Hopper for NVIDIA)
-5. Skip Flash Attention - the server works without it
-
-### Flash Attention: No module named 'torch' during build
-
-**Problem**: Flash Attention build fails with `ModuleNotFoundError: No module named 'torch'` even though PyTorch is installed (e.g., PyTorch 2.9.1+rocm6.4).
-
-**Cause**: pip uses isolated build environments by default, which prevents flash-attention from seeing the installed torch package during compilation.
-
-**Solutions**:
-1. **Use --no-build-isolation flag** (recommended):
-   ```bash
-   pip install flash-attn --no-build-isolation
-   ```
-
-2. **For ROCm systems**, you may also need to limit parallel jobs to avoid resource exhaustion:
-   ```bash
-   MAX_JOBS=4 pip install flash-attn --no-build-isolation
-   ```
-
-3. **Use pre-built wheels** if available for your platform (check https://github.com/Dao-AILab/flash-attention/releases)
-
-4. **ROCm 6.4 compatibility note**: Flash Attention may not officially support ROCm 6.4 yet (it was primarily built for ROCm 6.0). If build fails on ROCm 6.4, you can run without Flash Attention:
-   ```bash
-   python coderai --model meta-llama/Llama-2-7b-chat-hf
-   # (omit the --flash-attn flag)
-   ```
-
-5. **Fallback**: The server works perfectly without Flash Attention - simply omit the `--flash-attn` flag when starting the server.
-
-### bitsandbytes Not Working on ROCm
-
-**Problem**: Quantization fails on AMD GPUs
-
-**Solutions**:
-1. bitsandbytes has limited ROCm support
-2. Use disk offload instead: `--offload-dir /path/to/storage`
-3. Build bitsandbytes from source with ROCm support
-
-### Model Download Stuck or Slow
-
-**Problem**: HuggingFace model download is slow or fails
-
-**Solutions**:
-1. Set HuggingFace cache directory: `export HF_HOME=/path/to/cache`
-2. Use mirror: `export HF_ENDPOINT=https://hf-mirror.com` (for China)
-3. Download model manually with `git-lfs` and use local path
-
-### Auto-Detection Issues in Containers
+### Vulkan backend not available

-**Problem**: Wrong memory detection in Docker/Podman containers
-
-**Solutions**:
-1. Specify RAM manually: `--ram 16`
-2. Pass through GPU devices properly
-3. For Docker: `--gpus all` flag for NVIDIA, or proper device mapping for ROCm
-
-### API Returns 503 Errors
-
-**Problem**: `Model not loaded` error
-
-**Solutions**:
-1. Ensure model name is correct and accessible
-2. Check model requires authentication: `huggingface-cli login`
-3. Verify internet connection for first-time model download
-
-### ROCm Not Detected
-
-**Problem**: ROCm GPU not detected, falling back to CPU
-
-**Solutions**:
-1. Verify ROCm installation: `rocminfo`
-2. Check PyTorch ROCm build: `python -c "import torch; print(torch.version.hip)"`
-3. Set HIP visible devices: `export HIP_VISIBLE_DEVICES=0`
-
-### Import Errors
-
-**Problem**: `ModuleNotFoundError` for various packages
-
-**Solutions**:
-1. Reinstall requirements: `pip install -r requirements.txt --force-reinstall`
-2. Check Python version: `python --version` (should be 3.8+)
-3. Verify virtual environment is activated
-
-### Vulkan-Specific Issues
-
-**Problem**: "Vulkan backend not available" or llama-cpp fails to load
-
-**Solutions**:
-1. **Verify Vulkan drivers and shader compiler are installed:**
-   ```bash
-   # Check Vulkan installation
-   vulkaninfo | grep "deviceName"
-   
-   # Check glslc (shader compiler) - REQUIRED for building
-   glslc --version
-   
-   # Or install if missing
-   # Debian/Ubuntu:
-   sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslang-tools
-   
-   # Fedora:
-   sudo dnf install vulkan-loader-devel vulkan-tools mesa-vulkan-drivers glslang
-   ```
-   
-   **Note:** `glslc` is required to compile llama-cpp-python with Vulkan support. If you see "Could NOT find Vulkan (missing: glslc)", install the `glslc` package:
-   ```bash
-   sudo apt install glslc glslang-tools glslang-dev
-   
-   # If glslc still not found, check location and symlink:
-   find /usr -name "glslc" 2>/dev/null
-   sudo ln -s /usr/lib/shaderc/bin/glslc /usr/local/bin/glslc 2>/dev/null || sudo ln -s $(which glslangValidator) /usr/local/bin/glslc 2>/dev/null || echo "glslc not found, please install glslc package"
-   ```
-
-2. **Reinstall llama-cpp-python with Vulkan:**
-   ```bash
-   pip uninstall llama-cpp-python -y
-   CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir
-   ```
-
-3. **Check GPU compatibility:**
-    - **AMD**: RX 400 series and newer (best experience)
-    - **Intel**: HD 600 series integrated graphics or newer, all Intel Arc GPUs
-    - **NVIDIA**: GTX 900 series and newer (but CUDA backend preferred for NVIDIA)
-    - Any GPU with Vulkan 1.2+ driver support should work
-
-**Performance expectations by GPU:**
- AMD dedicated GPUs: Full performance, all layer offloading supported
- Intel Arc GPUs: Good performance, similar to AMD
- Intel integrated GPUs: Limited by shared system RAM, use smaller models (Q4_K_M under 2GB)
-
-**Problem**: GGUF model fails to load or produces garbled output
-
-**Solutions**:
-1. **Verify model format**: Must be GGUF format, not regular HuggingFace format
-   ```bash
-   # Check file extension
-   ls -la model.gguf  # Should end in .gguf
-   ```
+```bash
+# Install Vulkan drivers and shader compiler
+sudo apt install libvulkan-dev vulkan-tools mesa-vulkan-drivers glslc glslang-tools

-2. **Try different quantization**: Some GGUF files may be incompatible
-   - Q4_K_M is most compatible (recommended)
-   - Q5_K_M or Q6_K for higher quality
-   - Avoid IQ quants if having issues
+# Rebuild llama-cpp-python
+CMAKE_ARGS="-DGGML_VULKAN=ON" pip install llama-cpp-python --no-cache-dir --force-reinstall
+```

-3. **Check model architecture**: Some very new models may need updated llama-cpp
-   ```bash
-   pip install --upgrade llama-cpp-python
-   ```
+### Flash Attention build fails

-**Problem**: Vulkan backend runs on CPU instead of GPU
+```bash
+MAX_JOBS=4 pip install flash-attn --no-build-isolation
+```

-**Solutions**:
-1. **Check layer offloading**: Verify layers are being offloaded
-   ```bash
-   # Check GPU layers parameter (default -1 = all layers)
-   python coderai --model model.gguf --backend vulkan --n-gpu-layers 35
-   ```
+### Model not loading (503 errors)

-2. **Check verbose output**: Look for Vulkan device initialization in logs
-   ```bash
-   # Run with verbose logging
-   python coderai --model model.gguf --backend vulkan 2>&1 | grep -i vulkan
-   ```
+- Verify model name matches exactly what's in `models.json`
+- Check HuggingFace authentication: `huggingface-cli login`
+- Ensure the model type matches the endpoint (image models cannot be used via `/v1/chat/completions`)

-3. **Verify GPU visibility**: Check that Vulkan sees your GPU
-   ```bash
-   vulkaninfo | grep -A 5 "GPU0\|GPU1"
-   ```
-
-### Backend Not Detected
-
-**Problem**: "No suitable backend found" error
-
-**Solutions**:
-1. **Check which backends are available:**
-   ```bash
-   python -c "import coderai; print(coderai.detect_available_backends())"
-   ```
-
-2. **For NVIDIA**: Ensure PyTorch with CUDA is installed
-   ```bash
-   python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
-   ```
-
-3. **For Vulkan**: Ensure llama-cpp-python is installed with Vulkan support
-   ```bash
-   python -c "from llama_cpp import Llama; print('llama-cpp available')"
-   ```
+---

 ## License

-This project is licensed under the GNU General Public License v3.0 - see the [LICENSE.md](LICENSE.md) file for details.
+GNU General Public License v3.0 — see [LICENSE.md](LICENSE.md).

 ## Contributing

-Contributions are welcome! Please feel free to submit a merge request.
+Merge requests welcome.

 ## Acknowledgments

- Built with [FastAPI](https://fastapi.tiangolo.com/)
- Powered by [HuggingFace Transformers](https://huggingface.co/docs/transformers/) (NVIDIA backend)
- Powered by [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) with Vulkan support (AMD/Intel backend)
- Inspired by the OpenAI API specification
-
---
-
-**Note on AI.PROMPT**: This project was enhanced following instructions to add Vulkan support for AMD and Intel GPUs alongside the existing NVIDIA/CUDA support. The implementation uses llama-cpp-python for Vulkan/GGUF model support while maintaining full compatibility with the existing HuggingFace/Transformers backend for NVIDIA GPUs.
+- [FastAPI](https://fastapi.tiangolo.com/)
+- [HuggingFace Transformers](https://huggingface.co/docs/transformers/) — NVIDIA text backend
+- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) — Vulkan/CUDA GGUF text backend
+- [stable-diffusion-cpp-python](https://github.com/william-murray1204/stable-diffusion-cpp-python) — GGUF image backend
+- [InsightFace](https://github.com/deepinsight/insightface) — face swap
+- [F5-TTS](https://github.com/SWivid/F5-TTS) — voice cloning
+- [Seed-VC](https://github.com/Plachta/Seed-VC) — singing voice conversion
+- [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) — image/video upscaling
--- a/build.sh
+++ b/build.sh
@@ -522,7 +522,14 @@ elif [ "$BACKEND" = "all" ]; then
        pip install setproctitle || echo -e "${YELLOW}Warning: setproctitle failed (optional)${NC}"

        # Try stable-diffusion-cpp-python (disable WebM to avoid missing libwebm cmake submodule)
+        # Use CUDA if available (detected later in this block, check nvcc now)
+        if command -v nvcc &> /dev/null || [ -d "/usr/local/cuda" ]; then
+            CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
+            CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || \
+            echo -e "${YELLOW}Warning: stable-diffusion-cpp-python failed (optional)${NC}"
+        else
            CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || echo -e "${YELLOW}Warning: stable-diffusion-cpp-python failed (optional)${NC}"
+        fi
    }
    
    # Install PyTorch with CUDA support (for nvidia backend)
@@ -622,14 +629,28 @@ elif [ "$BACKEND" = "all" ]; then
        echo -e "${YELLOW}Warning: Some Vulkan packages failed to install${NC}"
    }
    
-    # Try to install stable-diffusion-cpp-python with OpenCL
-    if [ "$OPENCL_AVAILABLE" = true ]; then
-        echo -e "${YELLOW}Installing stable-diffusion-cpp-python with OpenCL support...${NC}"
-        CMAKE_ARGS="$SD_CMAKE_ARGS" pip install stable-diffusion-cpp-python || {
-            echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available (requires CMake and build tools)${NC}"
+    # Try to install stable-diffusion-cpp-python with CUDA+Vulkan (preferred) or fallbacks
+    if [ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" = true ]; then
+        echo -e "${YELLOW}Installing stable-diffusion-cpp-python with CUDA+Vulkan support...${NC}"
+        CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON -DSD_VULKAN=ON" pip install stable-diffusion-cpp-python --no-cache-dir || {
+            echo -e "${YELLOW}CUDA+Vulkan build failed, trying CUDA only...${NC}"
+            CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
+                echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
        }
+    elif [ "$CUDA_AVAILABLE" = true ]; then
+        echo -e "${YELLOW}Installing stable-diffusion-cpp-python with CUDA support...${NC}"
+        CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_CUDA=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
+            echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
+    elif [ "$VULKAN_AVAILABLE" = true ]; then
+        echo -e "${YELLOW}Installing stable-diffusion-cpp-python with Vulkan support...${NC}"
+        CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_VULKAN=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
+            echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
+    elif [ "$OPENCL_AVAILABLE" = true ]; then
+        echo -e "${YELLOW}Installing stable-diffusion-cpp-python with OpenCL support...${NC}"
+        CMAKE_ARGS="$SD_CMAKE_ARGS -DSD_OPENCL=ON" pip install stable-diffusion-cpp-python --no-cache-dir || \
+            echo -e "${YELLOW}Warning: stable-diffusion-cpp-python not available${NC}"
    else
-        echo -e "${YELLOW}Skipping OpenCL (stable-diffusion-cpp-python) - OpenCL not available${NC}"
+        echo -e "${YELLOW}Skipping GPU-accelerated stable-diffusion-cpp-python - no GPU backend available${NC}"
    fi

    # Install additional requirements
@@ -667,8 +688,11 @@ elif [ "$BACKEND" = "all" ]; then
    echo "Available backends:"
    [ "$CUDA_AVAILABLE" = true ] && echo "  ✓ NVIDIA/CUDA (PyTorch)"
    [ "$CUDA_AVAILABLE" = true ] && echo "  ✓ CUDA (llama-cpp-python)"
+    [ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" = true ] && echo "  ✓ CUDA+Vulkan (stable-diffusion-cpp-python)"
+    [ "$CUDA_AVAILABLE" = true ] && [ "$VULKAN_AVAILABLE" != true ] && echo "  ✓ CUDA (stable-diffusion-cpp-python)"
+    [ "$CUDA_AVAILABLE" != true ] && [ "$VULKAN_AVAILABLE" = true ] && echo "  ✓ Vulkan (stable-diffusion-cpp-python)"
    [ "$VULKAN_AVAILABLE" = true ] && echo "  ✓ Vulkan (llama-cpp-python)"
-    [ "$OPENCL_AVAILABLE" = true ] && echo "  ✓ OpenCL (stable-diffusion-cpp-python)"
+    [ "$OPENCL_AVAILABLE" = true ] && [ "$CUDA_AVAILABLE" != true ] && [ "$VULKAN_AVAILABLE" != true ] && echo "  ✓ OpenCL (stable-diffusion-cpp-python)"
    echo "  ✓ CPU (fallback for all)"
    if [ "$FLASH" = true ] && [ "$CUDA_AVAILABLE" = true ]; then
        echo ""

--- a/codai/admin/auth.py
+++ b/codai/admin/auth.py
@@ -15,10 +15,13 @@
 # along with this program. If not, see <https://www.gnu.org/licenses/>.

 """Authentication and session management for admin dashboard."""
+import base64
 import hashlib
 import hmac
 import json
+import os
 import secrets
+import threading
 import time
 from pathlib import Path
 from typing import Any, Dict, Optional
@@ -43,35 +46,62 @@ def get_or_create_secret(config_dir: Path) -> bytes:


 def hash_password(password: str) -> str:
-    """Hash a password using SHA-256 with salt.
+    """Hash a password using argon2 (preferred) or scrypt as fallback.

-    In production, use argon2 or bcrypt. This is a minimal implementation
-    for environments where those libraries aren't available.
+    New hashes are always produced with a proper key-derivation function and
+    a per-password random salt.  The legacy SHA-256/static-salt format is
+    only retained for *verification* of pre-existing hashes.
    """
-    # Use SHA-256 with a pepper-like secret for basic hashing
-    # Real implementation should use argon2 from main.py
-    salt = b'static_salt_'  # In production, use per-user random salt
-    return hashlib.sha256(salt + password.encode()).hexdigest()
+    try:
+        from argon2 import PasswordHasher
+        ph = PasswordHasher()
+        return ph.hash(password)
+    except ImportError:
+        pass
+    # scrypt fallback: encode as "scrypt:<b64salt>:<b64key>"
+    salt = os.urandom(16)
+    key = hashlib.scrypt(password.encode(), salt=salt, n=2**14, r=8, p=1)
+    return "scrypt:" + base64.b64encode(salt).decode() + ":" + base64.b64encode(key).decode()


 def verify_password(password: str, password_hash: str) -> bool:
-    """Verify a password against its hash."""
-    # Try argon2 first
+    """Verify a password against its hash.
+
+    Supports argon2, scrypt (new format), and the legacy SHA-256/static-salt
+    format so that old stored hashes continue to work.
+    """
+    # --- argon2 ---
    try:
        from argon2 import PasswordHasher
-        from argon2.exceptions import VerifyMismatchError
+        from argon2.exceptions import VerifyMismatchError, InvalidHashError
        ph = PasswordHasher()
        try:
            return ph.verify(password_hash, password)
        except VerifyMismatchError:
            return False
+        except InvalidHashError:
+            pass  # not an argon2 hash; fall through
        except Exception:
            pass
    except ImportError:
        pass

-    # Fallback to simple hash
-    return hash_password(password) == password_hash
+    # --- scrypt ---
+    if password_hash.startswith("scrypt:"):
+        try:
+            parts = password_hash.split(":")
+            if len(parts) == 3:
+                salt = base64.b64decode(parts[1])
+                stored_key = base64.b64decode(parts[2])
+                new_key = hashlib.scrypt(password.encode(), salt=salt, n=2**14, r=8, p=1)
+                return hmac.compare_digest(new_key, stored_key)
+        except Exception:
+            pass
+        return False
+
+    # --- legacy SHA-256 with static salt (read-only; never written for new passwords) ---
+    legacy = hashlib.sha256(b'static_salt_' + password.encode()).hexdigest()
+    return hmac.compare_digest(legacy, password_hash)


 class SessionManager:
@@ -81,7 +111,7 @@ class SessionManager:
        self.config_dir = config_dir
        self.secret = get_or_create_secret(config_dir)
        self.session_timeout = timedelta(minutes=session_timeout_minutes)
-        self._lock = __import__('threading').Lock()
+        self._lock = threading.Lock()
    
    def _load_auth_data(self) -> Dict[str, Any]:
        """Load auth.json data."""

--- a/codai/admin/routes.py
+++ b/codai/admin/routes.py
@@ -236,15 +236,51 @@ async def api_status(username: str = Depends(require_auth)):

    # VRAM info
    vram = None
+    is_cuda = False
    try:
        import torch
        if torch.cuda.is_available():
+            is_cuda = True
            free, total = torch.cuda.mem_get_info()
            used = total - free
-            vram = {"used": round(used / 1e9, 2), "total": round(total / 1e9, 2)}
+            vram = {"used": round(used / 1e9, 2), "free": round(free / 1e9, 2), "total": round(total / 1e9, 2),
+                    "gpu": torch.cuda.get_device_name(0)}
    except Exception:
        pass

+    # Non-CUDA: read from sysfs (AMD amdgpu / Intel i915 / Arc)
+    if not is_cuda:
+        import os, glob as _glob
+        for card in sorted(_glob.glob("/sys/class/drm/card[0-9]")):
+            dev = card + "/device"
+            vram_total_path = dev + "/mem_info_vram_total"
+            if not os.path.exists(vram_total_path):
+                continue
+            try:
+                total_b = int(open(vram_total_path).read())
+                used_b  = int(open(dev + "/mem_info_vram_used").read())
+                free_b  = total_b - used_b
+                # GPU name from lspci
+                gpu_name = ""
+                try:
+                    pci_addr = os.path.basename(os.path.realpath(dev))
+                    import subprocess
+                    r = subprocess.run(["lspci", "-s", pci_addr], capture_output=True, text=True, timeout=3)
+                    if r.returncode == 0 and r.stdout:
+                        # "05:00.0 VGA compatible controller: AMD Radeon RX 580"
+                        gpu_name = r.stdout.split(":", 2)[-1].strip().rstrip()
+                except Exception:
+                    pass
+                vram = {
+                    "gpu": gpu_name,
+                    "used": round(used_b / 1e9, 2),
+                    "free": round(free_b / 1e9, 2),
+                    "total": round(total_b / 1e9, 2),
+                }
+                break
+            except Exception:
+                continue
+
    # Request stats from queue manager
    req_total = 0
    req_active = 0
@@ -285,6 +321,17 @@ async def api_status(username: str = Depends(require_auth)):
    except Exception:
        pass

+    # Whisper-server status
+    whisper_status = None
+    try:
+        from codai.models.manager import multi_model_manager as _mmm
+        if _mmm.whisper_servers:
+            whisper_status = {mid: wsm.get_status() for mid, wsm in _mmm.whisper_servers.items()}
+        elif _mmm.whisper_server:
+            whisper_status = {"whisper-server": _mmm.whisper_server.get_status()}
+    except Exception:
+        pass
+
    return {
        "status": "ok",
        "backend": backend,
@@ -293,8 +340,10 @@ async def api_status(username: str = Depends(require_auth)):
        "loaded_models": loaded_keys,
        "enabled_models": enabled_models,
        "vram": vram,
+        "cuda": is_cuda,
        "requests": {"total": req_total, "active": req_active},
        "recent_activity": recent_activity,
+        "whisper_server": whisper_status,
    }


@@ -1195,16 +1244,23 @@ async def api_model_configure(request: Request, username: str = Depends(require_
        raise HTTPException(status_code=503, detail="Config manager not initialized")
    data = await request.json()
    path = data.get("path") or data.get("model_id", "")
-    model_type = data.get("model_type", "text_models")
-    # Treat legacy gguf_models as text_models (GGUF is a format, not a type)
-    if model_type == "gguf_models":
-        model_type = "text_models"
    valid = {"text_models", "image_models", "audio_models", "tts_models", "vision_models", "video_models",
             "audio_gen_models", "embedding_models"}
    if not path:
        raise HTTPException(status_code=400, detail="path is required")
-    if model_type not in valid:
-        raise HTTPException(status_code=400, detail=f"model_type must be one of {valid}")
+
+    # Accept model_types (list) or fall back to single model_type
+    raw_types = data.get("model_types") or []
+    if not raw_types:
+        raw_types = [data.get("model_type", "text_models")]
+    # Normalize: gguf_models → text_models, deduplicate, filter valid
+    model_types = list(dict.fromkeys(
+        ("text_models" if t == "gguf_models" else t)
+        for t in raw_types if t
+    ))
+    model_types = [t for t in model_types if t in valid]
+    if not model_types:
+        model_types = ["text_models"]

    # Remove from all categories (handles type changes)
    for cat in valid | {"gguf_models"}:
@@ -1220,14 +1276,16 @@ async def api_model_configure(request: Request, username: str = Depends(require_
        import os
        if os.path.isfile(path):
            size_bytes = os.path.getsize(path)
-            # GGUF: ~1.1x file size; HF safetensors: ~1.2x
            multiplier = 1.1 if path.endswith(".gguf") else 1.2
            used_vram_gb = round(size_bytes / 1e9 * multiplier, 2)

-    # Build settings entry (drop None-valued optional keys to keep JSON tidy)
-    entry: dict = {"path": path, "model_type": model_type}
+    # Build settings entry
+    entry: dict = {"path": path, "model_type": model_types[0], "model_types": model_types}
    if used_vram_gb is not None:
        entry["used_vram_gb"] = used_vram_gb
+    # Store video sub-types (t2v / i2v / v2v) when present
+    if data.get("video_subtypes"):
+        entry["video_subtypes"] = data["video_subtypes"]
    for key in ("alias", "backend", "load_mode", "n_gpu_layers", "n_ctx",
                "max_gpu_percent", "manual_ram_gb", "load_in_4bit", "load_in_8bit",
                "flash_attention", "no_ram", "offload_strategy", "offload_dir",
@@ -1235,7 +1293,9 @@ async def api_model_configure(request: Request, username: str = Depends(require_
        if key in data:
            entry[key] = data[key]

-    config_manager.models_data.setdefault(model_type, []).append(entry)
+    # Add entry to each selected category
+    for mtype in model_types:
+        config_manager.models_data.setdefault(mtype, []).append(entry)
    config_manager.save_models()
    return {"success": True}

@@ -1286,6 +1346,7 @@ async def api_get_settings(username: str = Depends(require_admin)):
            "https": c.server.https,
            "https_key_path": c.server.https_key_path,
            "https_cert_path": c.server.https_cert_path,
+            "queue_max_size": c.server.queue_max_size,
        },
        "backend": {
            "type": c.backend.type,
@@ -1341,6 +1402,10 @@ async def api_save_settings(request: Request, username: str = Depends(require_ad
        c.server.https = bool(srv.get("https", c.server.https))
        c.server.https_key_path = srv.get("https_key_path") or None
        c.server.https_cert_path = srv.get("https_cert_path") or None
+        if "queue_max_size" in srv:
+            c.server.queue_max_size = max(1, int(srv["queue_max_size"]))
+            from codai.queue.manager import queue_manager
+            queue_manager.max_size = c.server.queue_max_size

    if "backend" in data:
        bk = data["backend"]
@@ -1395,6 +1460,81 @@ async def api_save_settings(request: Request, username: str = Depends(require_ad
    return {"success": True}


+
+# --- Whisper-server management ---
+
+@router.get("/admin/api/whisper-server/status")
+async def api_whisper_server_status(username: str = Depends(require_admin)):
+    """Return status of all registered whisper-server instances."""
+    from codai.models.manager import multi_model_manager
+    if multi_model_manager.whisper_servers:
+        return {
+            mid: wsm.get_status()
+            for mid, wsm in multi_model_manager.whisper_servers.items()
+        }
+    # Legacy single-instance fallback
+    if multi_model_manager.whisper_server:
+        return {"whisper-server": multi_model_manager.whisper_server.get_status()}
+    return {}
+
+
+@router.post("/admin/api/whisper-server/start")
+async def api_whisper_server_start(request: Request, username: str = Depends(require_admin)):
+    """Start (or restart) a whisper-server instance by model_id."""
+    from codai.models.manager import multi_model_manager
+    data = await request.json()
+    model_id   = data.get("model_id", "whisper-server")
+    server_path = data.get("server_path", "")
+    model_path  = data.get("model_path") or None
+    port        = int(data.get("port", 8744))
+    gpu_device  = int(data.get("gpu_device", 0))
+
+    if not server_path:
+        raise HTTPException(status_code=400, detail="server_path required")
+
+    wsm = multi_model_manager.whisper_servers.get(model_id)
+    if wsm is None:
+        wsm = multi_model_manager.register_whisper_server(
+            model_id=model_id, server_path=server_path,
+            model_path=model_path, port=port, gpu_device=gpu_device,
+        )
+    else:
+        wsm.server_path = server_path
+        wsm.port = port
+        wsm.base_url = f"http://127.0.0.1:{port}"
+        wsm._model_path = model_path
+        wsm._gpu_device = gpu_device
+
+    result = wsm.start(model_path, gpu_device=gpu_device)
+    running = wsm.is_running()
+
+    if running:
+        ws_key = f"audio:{model_id}"
+        multi_model_manager.models[ws_key] = wsm
+        multi_model_manager.active_in_vram = ws_key
+        multi_model_manager.models_in_vram.add(ws_key)
+
+    return {"success": running, "running": running, "started_model": result}
+
+
+@router.post("/admin/api/whisper-server/stop")
+async def api_whisper_server_stop(request: Request, username: str = Depends(require_admin)):
+    """Stop a whisper-server instance by model_id."""
+    from codai.models.manager import multi_model_manager
+    data = await request.json() if request.headers.get("content-type", "").startswith("application/json") else {}
+    model_id = data.get("model_id", "whisper-server")
+
+    wsm = multi_model_manager.whisper_servers.get(model_id) or multi_model_manager.whisper_server
+    if wsm:
+        wsm.stop()
+        ws_key = f"audio:{model_id}"
+        multi_model_manager.models.pop(ws_key, None)
+        multi_model_manager.models_in_vram.discard(ws_key)
+        if multi_model_manager.active_in_vram == ws_key:
+            multi_model_manager.active_in_vram = None
+    return {"success": True, "running": False}
+
+
 # --- HuggingFace model search proxy ---

 import re as _re

--- a/codai/admin/static/style.css
+++ b/codai/admin/static/style.css
@@ -8,8 +8,8 @@
  --border:   #1A1D28;
  --border-2: #252836;
  --text:     #DDE1F0;
-  --text-2:   #636880;
-  --text-3:   #2E3145;
+  --text-2:   #8B90A8;
+  --text-3:   #555A72;
  --accent:   #6366F1;
  --accent-s: rgba(99,102,241,.12);
  --green:    #34D399;

--- a/codai/admin/templates/chat.html
+++ b/codai/admin/templates/chat.html
--- a/codai/admin/templates/dashboard.html
+++ b/codai/admin/templates/dashboard.html
@@ -28,15 +28,17 @@
    <div class="stat-value" id="req-total">0</div>
    <div class="stat-sub"><span id="req-active">0</span> active</div>
  </div>
-  <div class="stat">
+  <div class="stat" id="vram-card" style="display:none">
    <div class="stat-label">VRAM</div>
-    <div class="stat-value" id="vram-pct">—</div>
+    <div class="stat-value" id="vram-pct" style="font-size:2rem">—</div>
    <div class="progress" style="margin-top:.625rem">
      <div class="progress-fill" id="vram-bar" style="width:0%"></div>
    </div>
-    <div class="progress-labels">
-      <span id="vram-used">—</span><span id="vram-total">—</span>
+    <div class="progress-labels" style="color:var(--text-1);font-size:12px;margin-top:.4rem">
+      <span id="vram-used">—</span><span id="vram-free">—</span>
    </div>
+    <div style="font-size:11.5px;color:var(--text-2);margin-top:.2rem;font-family:var(--mono)" id="vram-total-line"></div>
+    <div class="stat-sub" id="vram-gpu" style="margin-top:.25rem"></div>
  </div>
 </div>

@@ -85,13 +87,25 @@ async function poll() {
    document.getElementById('active-models').innerHTML = html || '<span class="muted small">No models loaded</span>';

    if (d.vram) {
-      const pct = Math.round(d.vram.used / d.vram.total * 100);
-      document.getElementById('vram-pct').textContent = pct + '%';
-      document.getElementById('vram-bar').style.width = pct + '%';
-      document.getElementById('vram-used').textContent = d.vram.used.toFixed(1) + ' GB';
-      document.getElementById('vram-total').textContent = d.vram.total.toFixed(1) + ' GB';
+      document.getElementById('vram-card').style.display = '';
+      if (d.vram.free != null && d.vram.total) {
+        const usedPct = Math.round(d.vram.used / d.vram.total * 100);
+        document.getElementById('vram-pct').textContent = usedPct + '%';
+        document.getElementById('vram-bar').style.width = usedPct + '%';
+        document.getElementById('vram-used').textContent = d.vram.used.toFixed(1) + ' GB used';
+        document.getElementById('vram-free').textContent = d.vram.free.toFixed(1) + ' GB free';
+        document.getElementById('vram-total-line').textContent = d.vram.total.toFixed(1) + ' GB total';
+      } else {
+        document.getElementById('vram-pct').textContent = d.vram.total ? d.vram.total.toFixed(1) + ' GB' : '—';
+        document.getElementById('vram-bar').style.width = '0%';
+        document.getElementById('vram-used').textContent = '';
+        document.getElementById('vram-free').textContent = '';
+        document.getElementById('vram-total-line').textContent = '';
+      }
+      const gpuName = d.vram.gpu || '';
+      document.getElementById('vram-gpu').textContent = gpuName.length > 32 ? gpuName.slice(0, 32) + '…' : gpuName;
    } else {
-      document.getElementById('vram-pct').textContent = 'N/A';
+      document.getElementById('vram-card').style.display = 'none';
    }

    if (d.requests) {

--- a/codai/admin/templates/models.html
+++ b/codai/admin/templates/models.html
@@ -94,6 +94,20 @@
    <div class="card-title">GGUF files <span id="gguf-file-badge" class="muted small"></span></div>
    <div id="gguf-models-list"><span class="muted small">Loading…</span></div>
  </div>
+
+  <!-- Whisper Server -->
+  <div class="card mb-0" style="margin-top:1rem" id="ws-card">
+    <div style="display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:.5rem">
+      <div>
+        <div class="card-title" style="margin:0">whisper-server <span class="muted" style="font-size:11px;font-weight:400">— native subprocess (AMD/Vulkan)</span></div>
+        <div id="ws-model-status" class="muted small" style="margin-top:.25rem">—</div>
+      </div>
+      <div style="display:flex;align-items:center;gap:.5rem">
+        <span id="ws-running-badge" style="font-size:12px;font-weight:500">—</span>
+        <a href="/admin/settings" class="btn btn-sm btn-ghost">Configure</a>
+      </div>
+    </div>
+  </div>
 </div>

 <!-- SEARCH -->
@@ -315,25 +329,27 @@
        <label class="form-label">Model ID / path</label>
        <div id="cfg-id-label" style="font-size:12px;font-family:monospace;color:var(--text-2);word-break:break-all;padding:.3rem 0"></div>
      </div>
-      <div style="display:grid;grid-template-columns:1fr 1fr;gap:.75rem">
-        <div class="form-row" style="margin:0">
-          <label class="form-label">Type</label>
-          <select id="cfg-type" class="form-input">
-            <option value="text_models">Text (LLM)</option>
-            <option value="image_models">Image generation</option>
-            <option value="video_models">Video generation</option>
-            <option value="audio_models">Audio transcription (STT)</option>
-            <option value="tts_models">Text-to-speech (TTS)</option>
-            <option value="vision_models">Vision / VLM</option>
-            <option value="audio_gen_models">Audio generation (Music/SFX)</option>
-            <option value="embedding_models">Embeddings</option>
-          </select>
+      <div class="form-row">
+        <label class="form-label" style="display:flex;align-items:center;gap:.5rem">Type
+          <span id="cfg-type-autodet" style="font-size:11px;color:var(--text-3);font-weight:400"></span>
+        </label>
+        <div style="display:grid;grid-template-columns:1fr 1fr;gap:.3rem .75rem;margin-top:.35rem">
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="text_models"> Text / LLM</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="vision_models"> Vision / VLM</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="image_models"> Image gen (T2I / I2I)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="t2v" value="video_models"> Video gen (T2V)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="i2v" value="video_models"> Image-to-Video (I2V)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" data-sub="v2v" value="video_models"> Video-to-Video (V2V)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="audio_models"> Audio transcription (STT)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="tts_models"> Text-to-Speech (TTS)</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="audio_gen_models"> Audio generation</label>
+          <label style="display:flex;align-items:center;gap:.4rem;cursor:pointer;font-size:13px"><input type="checkbox" class="cfg-type-cb" value="embedding_models"> Embeddings</label>
        </div>
-        <div class="form-row" style="margin:0">
+      </div>
+      <div class="form-row">
        <label class="form-label">Alias <span class="muted">(optional)</span></label>
        <input type="text" id="cfg-alias" class="form-input" placeholder="Friendly name">
      </div>
-      </div>

      <!-- backend -->
      <div class="card-title" style="margin-top:1.25rem">Backend</div>
@@ -501,6 +517,33 @@ async function loadGlobalSettings(){
  }catch{}
 }

+async function loadWsStatus(){
+  try{
+    const s = await fetch('/admin/api/whisper-server/status').then(r=>r.json());
+    const card = document.getElementById('ws-card');
+    const badge = document.getElementById('ws-running-badge');
+    const modelEl = document.getElementById('ws-model-status');
+    const entries = Object.entries(s);
+    if(!entries.length){
+      card.style.display = 'none';
+      return;
+    }
+    card.style.display = '';
+    const running = entries.filter(([,v])=>v.running);
+    if(running.length){
+      badge.textContent = `● ${running.length}/${entries.length} running`;
+      badge.style.color = 'var(--green, #4ade80)';
+      card.style.borderColor = 'rgba(74,222,128,.3)';
+      modelEl.textContent = running.map(([id,v])=>`${id}: ${v.model||'?'} @ ${v.url}`).join(' | ');
+    } else {
+      badge.textContent = '○ stopped';
+      badge.style.color = 'var(--text-2)';
+      card.style.borderColor = '';
+      modelEl.textContent = entries.map(([id])=>id).join(', ') + ' — not started';
+    }
+  }catch{}
+}
+
 /* ── GGUF format toggle ──────────────────────────────── */
 let _ggufMode = 'gguf';
 document.querySelectorAll('.tog-btn').forEach(btn=>{
@@ -958,7 +1001,8 @@ async function loadCachedModels(){
      const rows = hf.map(m=>{
        const idx = _localModels.length;
        _localModels.push({label:m.id, path:m.id, cacheType:'hf', size_gb:m.size_gb||0,
-          defaultType:m.model_type||'text_models', settings:m.settings||{}, in_config:m.in_config});
+          defaultType:m.model_type||'text_models', settings:m.settings||{}, in_config:m.in_config,
+          capabilities:m.capabilities||[]});
        const loaded = _loadedKeys.has(m.id) || [..._loadedKeys].some(k=>k.endsWith(':'+m.id)||k===m.id);
        const capBadges = fmtCapabilities(m.capabilities||[]);
        return `<tr style="border-top:1px solid var(--border)">
@@ -996,7 +1040,8 @@ async function loadCachedModels(){
      const rows = gguf.map(f=>{
        const idx = _localModels.length;
        _localModels.push({label:f.filename, path:f.path, cacheType:'gguf', size_gb:f.size_gb||0,
-          defaultType:f.model_type||'text_models', settings:f.settings||{}, in_config:f.in_config});
+          defaultType:f.model_type||'text_models', settings:f.settings||{}, in_config:f.in_config,
+          capabilities:f.capabilities||[]});
        const loaded = _loadedKeys.has(f.path) || _loadedKeys.has(f.filename) || [..._loadedKeys].some(k=>k.endsWith(':'+f.path)||k.endsWith(':'+f.filename));
        const capBadges = fmtCapabilities(f.capabilities||[]);
        return `<tr style="border-top:1px solid var(--border)">
@@ -1044,6 +1089,8 @@ async function refreshLocal(){

 loadGlobalSettings();
 refreshLocal();
+loadWsStatus();
+setInterval(loadWsStatus, 5000);

 async function clearCacheConfirm(type){
  const labels = {hf:'HuggingFace', gguf:'GGUF', all:'ALL'};
@@ -1070,6 +1117,51 @@ async function deleteModelConfirm(idx){
  }catch(e){alert('Error: '+e.message)}
 }

+/* ── type checkbox helpers ─────────────────────────────── */
+function _capabilitiesToTypes(caps) {
+  const categories = new Set(), subs = new Set();
+  if (!caps || !caps.length) return {categories, subs};
+  if (caps.includes('image_to_video'))  { categories.add('video_models'); subs.add('i2v'); }
+  if (caps.includes('video_generation')){ categories.add('video_models'); subs.add('t2v'); }
+  if (caps.includes('video_to_video'))  { categories.add('video_models'); subs.add('v2v'); }
+  if (caps.includes('image_generation') || caps.includes('image_to_image') ||
+      caps.includes('inpainting')       || caps.includes('controlnet')) categories.add('image_models');
+  if (caps.includes('image_to_text') && caps.includes('text_generation')) {
+    categories.add('vision_models');
+  } else if (caps.includes('text_generation') &&
+             !categories.has('video_models') && !categories.has('image_models')) {
+    categories.add('text_models');
+  }
+  if (caps.includes('speech_to_text')) categories.add('audio_models');
+  if (caps.includes('text_to_speech')) categories.add('tts_models');
+  if (caps.includes('audio_generation')) categories.add('audio_gen_models');
+  if (caps.includes('embeddings')) categories.add('embedding_models');
+  return {categories, subs};
+}
+
+function _setTypeCheckboxes(categories, subs) {
+  document.querySelectorAll('.cfg-type-cb').forEach(cb => {
+    const sub = cb.dataset.sub;
+    if (!categories.has(cb.value)) { cb.checked = false; return; }
+    if (sub) {
+      // Sub-typed checkbox (T2V / I2V / V2V): check only if this sub is in subs
+      cb.checked = subs.has(sub);
+    } else if (cb.value === 'video_models') {
+      // Non-sub video checkbox: only relevant when subs is empty (legacy/no-sub)
+      cb.checked = subs.size === 0;
+    } else {
+      cb.checked = true;
+    }
+  });
+}
+
+function _getCheckedTypes() {
+  const checked = [...document.querySelectorAll('.cfg-type-cb:checked')];
+  const categories = [...new Set(checked.map(cb => cb.value))];
+  const subs = checked.filter(cb => cb.dataset.sub).map(cb => cb.dataset.sub);
+  return {primaryType: categories[0] || 'text_models', model_types: categories, video_subtypes: subs};
+}
+
 function openCfgModal(idx){
  const m = _localModels[idx];
  const s = m.settings || {};
@@ -1077,9 +1169,46 @@ function openCfgModal(idx){
  document.getElementById('cfg-id-label').textContent = m.label;
  document.getElementById('cfg-path').value = m.path;
  document.getElementById('cfg-orig-type').value = m.defaultType;
-  // Map legacy gguf_models to text_models
-  const rawType = s.model_type || m.defaultType;
-  document.getElementById('cfg-type').value = rawType === 'gguf_models' ? 'text_models' : rawType;
+
+  // Determine type checkboxes: saved config > auto-detect from capabilities > defaultType
+  const det = document.getElementById('cfg-type-autodet');
+  if (s.model_types && s.model_types.length) {
+    // Previously saved multi-type config
+    const savedSubs = new Set(s.video_subtypes || []);
+    _setTypeCheckboxes(new Set(s.model_types), savedSubs);
+    det.textContent = '';
+  } else if (s.model_type) {
+    // Single saved type
+    const normType = s.model_type === 'gguf_models' ? 'text_models' : s.model_type;
+    let savedSubs = new Set(s.video_subtypes || []);
+    // Legacy video_models with no sub-types: infer from capabilities or default T2V
+    if (normType === 'video_models' && savedSubs.size === 0) {
+      const caps = m.capabilities || [];
+      if (caps.includes('image_to_video')) savedSubs.add('i2v');
+      if (caps.includes('video_to_video')) savedSubs.add('v2v');
+      if (caps.includes('video_generation') || savedSubs.size === 0) savedSubs.add('t2v');
+    }
+    _setTypeCheckboxes(new Set([normType]), savedSubs);
+    det.textContent = '';
+  } else {
+    // Auto-detect from capabilities
+    const caps = m.capabilities || [];
+    const {categories, subs} = _capabilitiesToTypes(caps);
+    if (categories.size > 0) {
+      _setTypeCheckboxes(categories, subs);
+      det.textContent = '(auto-detected)';
+    } else {
+      const rawType = m.defaultType === 'gguf_models' ? 'text_models' : (m.defaultType || 'text_models');
+      let fallbackSubs = new Set();
+      if (rawType === 'video_models') {
+        // No capabilities matched; default to T2V for unclassified video models
+        fallbackSubs.add('t2v');
+      }
+      _setTypeCheckboxes(new Set([rawType]), fallbackSubs);
+      det.textContent = m.cacheType === 'gguf' ? '(auto-detected: GGUF text model)' : '';
+    }
+  }
+
  document.getElementById('cfg-alias').value = s.alias || '';
  document.getElementById('cfg-backend').value = s.backend || 'auto';
  document.getElementById('cfg-load-mode').value = s.load_mode || 'on-request';
@@ -1117,9 +1246,12 @@ async function saveModelConfig(){
  const maxGpu = parseFloat(document.getElementById('cfg-max-gpu').value);
  const ramGb  = parseFloat(document.getElementById('cfg-ram-gb').value);
  const usedVram = parseFloat(document.getElementById('cfg-used-vram').value);
+  const {primaryType, model_types, video_subtypes} = _getCheckedTypes();
  const data = {
    path,
-    model_type:        document.getElementById('cfg-type').value,
+    model_type:        primaryType,
+    model_types:       model_types,
+    video_subtypes:    video_subtypes.length ? video_subtypes : undefined,
    alias:             document.getElementById('cfg-alias').value.trim() || null,
    backend:           document.getElementById('cfg-backend').value,
    load_mode:         document.getElementById('cfg-load-mode').value,

--- a/codai/admin/templates/settings.html
+++ b/codai/admin/templates/settings.html
@@ -45,6 +45,11 @@
      <input type="text" id="s-cert" class="form-input" placeholder="/path/to/cert.pem">
    </div>
  </div>
+  <div class="form-row" style="margin-top:1rem;margin-bottom:0">
+    <label class="form-label">Request queue max size</label>
+    <input type="number" id="s-queue-max" class="form-input" placeholder="6" min="1" max="1000" style="max-width:160px">
+    <span class="form-hint">Maximum number of concurrent queued requests. Authenticated requests arriving when the queue is full receive a 429 response.</span>
+  </div>
 </div>

 <!-- Storage -->
@@ -64,6 +69,48 @@
    <span class="form-hint">Models will inherit this as default when configured</span>
  </div>
 </div>
+
+<!-- Whisper Server -->
+<div class="card mb-0" style="margin-top:1rem">
+  <div style="display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:.5rem;margin-bottom:1rem">
+    <div class="card-title" style="margin:0">Whisper Server <span class="muted" style="font-size:11px;font-weight:400">(whisper.cpp native binary — recommended for AMD/Vulkan)</span></div>
+    <div style="display:flex;align-items:center;gap:.5rem">
+      <span id="ws-badge" class="muted small">—</span>
+      <button class="btn btn-sm btn-secondary" onclick="wsStart()">Start</button>
+      <button class="btn btn-sm btn-danger" onclick="wsStop()">Stop</button>
+    </div>
+  </div>
+  <div style="display:grid;grid-template-columns:1fr 160px;gap:1rem;align-items:start">
+    <div class="form-row" style="margin:0">
+      <label class="form-label">Model ID <span class="muted">(used in API calls, e.g. whisper-base)</span></label>
+      <input type="text" id="ws-id" class="form-input" placeholder="whisper-server">
+      <span class="form-hint">The name clients use in the <code>model</code> field of transcription requests</span>
+    </div>
+    <div class="form-row" style="margin:0">
+      <label class="form-label">Port</label>
+      <input type="number" id="ws-port" class="form-input" placeholder="8744" min="1024" max="65535">
+    </div>
+  </div>
+  <div style="display:grid;grid-template-columns:1fr 160px;gap:1rem;align-items:start;margin-top:1rem">
+    <div class="form-row" style="margin:0">
+      <label class="form-label">whisper-server binary path</label>
+      <input type="text" id="ws-path" class="form-input" placeholder="/usr/local/bin/whisper-server">
+    </div>
+    <div class="form-row" style="margin:0">
+      <label class="form-label">GPU device index</label>
+      <input type="number" id="ws-gpu" class="form-input" placeholder="0" min="0">
+    </div>
+  </div>
+  <div class="form-row" style="margin-top:1rem;margin-bottom:0">
+    <label class="form-label">Model path <span class="muted">(GGUF whisper model, e.g. ggml-base.bin)</span></label>
+    <input type="text" id="ws-model" class="form-input" placeholder="/path/to/ggml-base.bin">
+    <span class="form-hint">Configure multiple instances by adding entries to <code>models.json</code> with <code>"backend": "whisper-server"</code></span>
+  </div>
+  <p class="form-hint" style="margin-top:.75rem;margin-bottom:0">
+    When configured, the transcription endpoint uses this subprocess instead of the Python faster-whisper module.
+    Saves settings to <code>config.json</code> and takes effect immediately (no restart needed).
+  </p>
+</div>
 {% endblock %}

 {% block scripts %}
@@ -89,13 +136,69 @@ async function loadSettings(){
    document.getElementById('s-https').checked = !!d.server?.https;
    document.getElementById('s-key').value   = d.server?.https_key_path ?? '';
    document.getElementById('s-cert').value  = d.server?.https_cert_path ?? '';
+    document.getElementById('s-queue-max').value = d.server?.queue_max_size ?? 6;
    document.getElementById('s-hf-cache').value   = d.models?.hf_cache_dir ?? '';
    document.getElementById('s-gguf-cache').value = d.models?.gguf_cache_dir ?? '';
    document.getElementById('s-offload-dir').value = d.offload?.directory ?? './offload';
+    document.getElementById('ws-path').value = d.whisper?.server_path ?? '';
+    document.getElementById('ws-port').value = d.whisper?.server_port ?? 8744;
    toggleHttps();
  }catch(e){ showAlert('error','Failed to load settings: '+e.message); }
 }

+async function loadWsStatus(){
+  try{
+    const s = await fetch('/admin/api/whisper-server/status').then(r=>r.json());
+    const badge = document.getElementById('ws-badge');
+    // s is now a dict of {model_id: {running, model, url}}
+    const entries = Object.entries(s);
+    if(!entries.length){
+      badge.textContent = '○ not configured';
+      badge.style.color = 'var(--text-2)';
+      return;
+    }
+    const running = entries.filter(([,v])=>v.running);
+    if(running.length){
+      badge.textContent = `● ${running.length} running`;
+      badge.style.color = 'var(--green, #4ade80)';
+    } else {
+      badge.textContent = '○ stopped';
+      badge.style.color = 'var(--text-2)';
+    }
+  }catch(e){}
+}
+
+async function wsStart(){
+  const path = document.getElementById('ws-path').value.trim();
+  if(!path){ showAlert('error','Binary path required'); return; }
+  try{
+    const r = await fetch('/admin/api/whisper-server/start',{
+      method:'POST', headers:{'Content-Type':'application/json'},
+      body: JSON.stringify({
+        model_id: document.getElementById('ws-id').value.trim() || 'whisper-server',
+        server_path: path,
+        model_path: document.getElementById('ws-model').value.trim() || null,
+        port: parseInt(document.getElementById('ws-port').value) || 8744,
+        gpu_device: parseInt(document.getElementById('ws-gpu').value) || 0,
+      })
+    });
+    const d = await r.json();
+    if(d.success) showAlert('info','whisper-server started');
+    else showAlert('error','Failed to start whisper-server');
+    loadWsStatus();
+  }catch(e){ showAlert('error','Error: '+e.message); }
+}
+
+async function wsStop(){
+  const modelId = document.getElementById('ws-id').value.trim() || 'whisper-server';
+  await fetch('/admin/api/whisper-server/stop',{
+    method:'POST', headers:{'Content-Type':'application/json'},
+    body: JSON.stringify({model_id: modelId})
+  });
+  showAlert('info','whisper-server stopped');
+  loadWsStatus();
+}
+
 async function saveSettings(){
  const strOrNull = id => document.getElementById(id).value.trim() || null;
  const data = {
@@ -105,6 +208,7 @@ async function saveSettings(){
      https: document.getElementById('s-https').checked,
      https_key_path:  strOrNull('s-key'),
      https_cert_path: strOrNull('s-cert'),
+      queue_max_size: parseInt(document.getElementById('s-queue-max').value) || 6,
    },
    models:{
      hf_cache_dir:   strOrNull('s-hf-cache'),
@@ -112,7 +216,11 @@ async function saveSettings(){
    },
    offload:{
      directory: document.getElementById('s-offload-dir').value.trim() || './offload',
-    }
+    },
+    whisper:{
+      server_path: document.getElementById('ws-path').value.trim() || null,
+      server_port: parseInt(document.getElementById('ws-port').value) || 8744,
+    },
  };
  try{
    const r = await fetch('/admin/api/settings',{
@@ -125,5 +233,7 @@ async function saveSettings(){
 }

 loadSettings();
+loadWsStatus();
+setInterval(loadWsStatus, 5000);
 </script>
 {% endblock %}
--- a/codai/api/app.py
+++ b/codai/api/app.py
@@ -19,12 +19,16 @@ FastAPI application module for codai API.
 Contains the FastAPI app initialization, lifespan, and core endpoints.
 """

+import logging
+import os
 from contextlib import asynccontextmanager
 from typing import List

 from fastapi import FastAPI, HTTPException, Request
 from fastapi.responses import FileResponse, JSONResponse

+logger = logging.getLogger(__name__)
+
 # Import from codai modules
 from codai.pydantic.textrequest import ModelList
 from codai.models.manager import model_manager, multi_model_manager
@@ -89,11 +93,19 @@ from codai.api.text import router as text_router
 from codai.api.video import router as video_router
 from codai.api.audio_gen import router as audio_gen_router
 from codai.api.embeddings import router as embeddings_router
+from codai.api.pipelines import router as pipelines_router
+from codai.api.custom_pipelines import router as custom_pipelines_router
+from codai.api.voice_clone import router as voice_clone_router
+from codai.api.voice_convert import router as voice_convert_router
+from codai.api.faceswap import router as faceswap_router
+from codai.api.characters import router as characters_router
 from codai.admin.routes import router as admin_router

 # Import and add middleware
 from codai.api.log import log_requests
+from codai.api.ratelimit import RateLimitMiddleware
 app.middleware("http")(log_requests)
+app.add_middleware(RateLimitMiddleware)

 # Mount static files for admin dashboard
 from fastapi.staticfiles import StaticFiles
@@ -110,6 +122,12 @@ app.include_router(text_router)
 app.include_router(video_router)
 app.include_router(audio_gen_router)
 app.include_router(embeddings_router)
+app.include_router(pipelines_router)
+app.include_router(custom_pipelines_router)
+app.include_router(voice_clone_router)
+app.include_router(voice_convert_router)
+app.include_router(faceswap_router)
+app.include_router(characters_router)
 app.include_router(admin_router)


@@ -133,11 +151,14 @@ async def list_models():
 @app.get("/v1/files/{filename}")
 async def get_file(filename: str):
    """Serve uploaded/generated files."""
-    print(f"DEBUG get_file: filename={filename}, global_file_path={global_file_path}")
-    if global_file_path:
-        import os
-        file_path = os.path.join(global_file_path, filename)
-        print(f"DEBUG get_file: full path={file_path}, exists={os.path.exists(file_path)}")
-        if os.path.exists(file_path):
-            return FileResponse(file_path)
+    if not global_file_path:
+        raise HTTPException(status_code=404, detail="File not found")
+    # Prevent path traversal: resolve to real paths and confirm the result
+    # stays inside the configured directory.
+    safe_base = os.path.realpath(global_file_path)
+    candidate = os.path.realpath(os.path.join(global_file_path, filename))
+    if not (candidate == safe_base or candidate.startswith(safe_base + os.sep)):
+        raise HTTPException(status_code=403, detail="Access denied")
+    if not os.path.isfile(candidate):
        raise HTTPException(status_code=404, detail="File not found")
+    return FileResponse(candidate)
\ No newline at end of file
--- a/codai/api/characters.py
+++ b/codai/api/characters.py
+# CoderAI - OpenAI-compatible API server
+# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Character profile endpoints.
+
+Saved character profiles are named collections of reference images used to
+maintain visual consistency of a character across multiple video generations.
+
+POST   /v1/characters              – save / update a character profile
+GET    /v1/characters              – list all saved profiles (no images)
+GET    /v1/characters/{name}       – get a profile including base64 images
+DELETE /v1/characters/{name}       – delete a profile
+"""
+
+import base64
+import json
+import os
+import time
+from typing import List, Optional
+
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel, ConfigDict
+
+router = APIRouter()
+
+_CHARS_DIR: Optional[str] = None
+
+
+def set_global_args(args):
+    global _CHARS_DIR
+    base = getattr(args, 'file_path', None) or os.path.expanduser('~/.coderai')
+    root = base if os.path.isdir(base) else (os.path.dirname(base) if base else os.path.expanduser('~/.coderai'))
+    _CHARS_DIR = os.path.join(root, 'characters')
+    os.makedirs(_CHARS_DIR, exist_ok=True)
+
+
+def set_global_file_path(path: str):
+    pass  # not needed for characters
+
+
+def _chars_dir() -> str:
+    if _CHARS_DIR:
+        return _CHARS_DIR
+    d = os.path.expanduser('~/.coderai/characters')
+    os.makedirs(d, exist_ok=True)
+    return d
+
+
+def _char_dir(name: str) -> str:
+    return os.path.join(_chars_dir(), name)
+
+
+# ── Pydantic models ───────────────────────────────────────────────────────────
+
+class CharacterImage(BaseModel):
+    label: Optional[str] = None      # e.g. "front", "side", "close-up"
+    data: str                         # base64 image (with or without data: prefix)
+    model_config = ConfigDict(extra="allow")
+
+
+class CharacterSaveRequest(BaseModel):
+    name: str
+    description: Optional[str] = ""
+    images: List[CharacterImage]      # one or more reference images
+    model_config = ConfigDict(extra="allow")
+
+
+class CharacterProfile(BaseModel):
+    name: str
+    description: Optional[str] = ""
+    image_count: int
+    created_at: int
+    images: Optional[List[CharacterImage]] = None  # only populated on GET /{name}
+    model_config = ConfigDict(extra="allow")
+
+
+# ── Helpers ───────────────────────────────────────────────────────────────────
+
+def _save_character(name: str, description: str, images: List[CharacterImage]) -> dict:
+    cdir = _char_dir(name)
+    os.makedirs(cdir, exist_ok=True)
+
+    img_files = []
+    for i, img in enumerate(images):
+        raw = img.data
+        if raw.startswith('data:'):
+            _, b64 = raw.split(',', 1)
+        else:
+            b64 = raw
+        img_bytes = base64.b64decode(b64)
+        # Detect PNG vs JPEG from magic bytes
+        ext = '.png' if img_bytes[:4] == b'\x89PNG' else '.jpg'
+        fname = f"ref{i:02d}{ext}"
+        fpath = os.path.join(cdir, fname)
+        with open(fpath, 'wb') as f:
+            f.write(img_bytes)
+        img_files.append({'file': fname, 'label': img.label or f'ref{i}'})
+
+    meta = {
+        'name': name,
+        'description': description,
+        'images': img_files,
+        'image_count': len(img_files),
+        'created_at': int(time.time()),
+    }
+    with open(os.path.join(cdir, 'meta.json'), 'w') as f:
+        json.dump(meta, f)
+    return meta
+
+
+def _load_character_meta(name: str) -> Optional[dict]:
+    meta_path = os.path.join(_char_dir(name), 'meta.json')
+    if not os.path.exists(meta_path):
+        return None
+    with open(meta_path) as f:
+        return json.load(f)
+
+
+def _load_character_images(name: str) -> List[CharacterImage]:
+    meta = _load_character_meta(name)
+    if not meta:
+        return []
+    cdir = _char_dir(name)
+    result = []
+    for img_info in meta.get('images', []):
+        fpath = os.path.join(cdir, img_info['file'])
+        if not os.path.exists(fpath):
+            continue
+        with open(fpath, 'rb') as f:
+            raw = f.read()
+        ext = img_info['file'].rsplit('.', 1)[-1]
+        mime = 'image/png' if ext == 'png' else 'image/jpeg'
+        b64 = base64.b64encode(raw).decode()
+        result.append(CharacterImage(
+            label=img_info.get('label'),
+            data=f"data:{mime};base64,{b64}",
+        ))
+    return result
+
+
+def _list_characters() -> list:
+    d = _chars_dir()
+    profiles = []
+    for entry in os.scandir(d):
+        if entry.is_dir():
+            meta = _load_character_meta(entry.name)
+            if meta:
+                profiles.append({k: v for k, v in meta.items() if k != 'images'})
+    return sorted(profiles, key=lambda p: p.get('created_at', 0))
+
+
+def resolve_character_profiles(profile_names: List[str]) -> List[str]:
+    """Resolve saved profile names → flat list of base64 image strings."""
+    out = []
+    for name in profile_names:
+        for img in _load_character_images(name):
+            out.append(img.data)
+    return out
+
+
+# ── Endpoints ─────────────────────────────────────────────────────────────────
+
+@router.post("/v1/characters")
+async def save_character(req: CharacterSaveRequest):
+    """Save or update a named character profile."""
+    if not req.name or '/' in req.name or '..' in req.name:
+        raise HTTPException(status_code=400, detail="Invalid character name")
+    if not req.images:
+        raise HTTPException(status_code=400, detail="At least one reference image required")
+    meta = _save_character(req.name, req.description or '', req.images)
+    return {"ok": True, "name": meta['name'], "image_count": meta['image_count']}
+
+
+@router.get("/v1/characters")
+async def list_characters():
+    """List all saved character profiles (metadata only, no images)."""
+    return {"characters": _list_characters()}
+
+
+@router.get("/v1/characters/{name}")
+async def get_character(name: str):
+    """Get a character profile including its reference images as base64."""
+    meta = _load_character_meta(name)
+    if not meta:
+        raise HTTPException(status_code=404, detail=f"Character '{name}' not found")
+    images = _load_character_images(name)
+    return {
+        "name": meta['name'],
+        "description": meta.get('description', ''),
+        "image_count": meta['image_count'],
+        "created_at": meta['created_at'],
+        "images": [img.model_dump() for img in images],
+    }
+
+
+@router.delete("/v1/characters/{name}")
+async def delete_character(name: str):
+    """Delete a character profile."""
+    cdir = _char_dir(name)
+    if not os.path.isdir(cdir):
+        raise HTTPException(status_code=404, detail=f"Character '{name}' not found")
+    import shutil
+    shutil.rmtree(cdir)
+    return {"ok": True, "name": name}
--- a/codai/api/custom_pipelines.py
+++ b/codai/api/custom_pipelines.py
+"""
+Custom pipeline executor.
+
+GET  /v1/pipelines/custom          — list saved custom pipelines
+POST /v1/pipelines/custom          — save a new custom pipeline definition
+PUT  /v1/pipelines/custom/{id}     — update a pipeline
+DELETE /v1/pipelines/custom/{id}   — delete a pipeline
+POST /v1/pipelines/custom/{id}/run — execute a saved pipeline
+POST /v1/pipelines/run             — execute an inline pipeline definition (no save)
+
+Pipeline definition schema:
+{
+  "id": "my-pipeline",          # auto-generated if absent
+  "name": "My Pipeline",
+  "steps": [
+    {
+      "type": "text_gen",        # step type (see STEP_TYPES)
+      "label": "Write script",   # optional display label
+      "params": {                # static params merged with runtime context
+        "model": "Qwen/Qwen3.5-9B",
+        "prompt": "{{input}}",   # {{input}} = pipeline input text
+                                 # {{stepN.output}} = output of step N
+                                 # {{stepN.url}} = URL output of step N
+      }
+    },
+    {
+      "type": "image_gen",
+      "params": {
+        "model": "sd-model",
+        "prompt": "{{step0.output}}"
+      }
+    }
+  ]
+}
+
+Step types and their endpoint mapping:
+  text_gen      → POST /v1/chat/completions
+  image_gen     → POST /v1/images/generations
+  image_edit    → POST /v1/images/edits
+  image_inpaint → POST /v1/images/inpaint
+  image_upscale → POST /v1/images/upscale
+  image_deblur  → POST /v1/images/deblur
+  image_unpix   → POST /v1/images/unpixelate
+  image_outfit  → POST /v1/images/outfit
+  image_faceswap→ POST /v1/images/faceswap
+  video_gen     → POST /v1/video/generations
+  video_upscale → POST /v1/video/upscale
+  video_sub     → POST /v1/video/subtitle
+  video_interp  → POST /v1/video/interpolate
+  video_dub     → POST /v1/video/dub
+  tts           → POST /v1/audio/speech
+  stt           → POST /v1/audio/transcriptions (multipart)
+  audio_gen     → POST /v1/audio/generate
+  voice_clone   → POST /v1/audio/clone
+  voice_convert → POST /v1/audio/convert
+"""
+
+import asyncio
+import time
+import uuid
+from typing import Any, Dict, List, Optional
+
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel, ConfigDict
+
+router = APIRouter()
+
+# ---------------------------------------------------------------------------
+# Step type → (handler_module, handler_fn, request_class)
+# ---------------------------------------------------------------------------
+
+STEP_TYPES = {
+    "text_gen":       ("codai.api.text",         "chat_completions",       "codai.pydantic.textrequest.ChatCompletionRequest"),
+    "image_gen":      ("codai.api.images",        "create_image_generation","codai.pydantic.imagerequest.ImageGenerationRequest"),
+    "image_edit":     ("codai.api.images",        "create_image_edit",      None),
+    "image_inpaint":  ("codai.api.images",        "create_image_inpaint",   None),
+    "image_upscale":  ("codai.api.images",        "create_image_upscale",   None),
+    "image_deblur":   ("codai.api.images",        "create_image_deblur",    None),
+    "image_unpix":    ("codai.api.images",        "create_image_unpixelate",None),
+    "image_outfit":   ("codai.api.images",        "create_image_outfit",    None),
+    "image_faceswap": ("codai.api.faceswap",      "faceswap",               None),
+    "video_gen":      ("codai.api.video",         "video_generations",       "codai.pydantic.videorequest.VideoGenerationRequest"),
+    "video_upscale":  ("codai.api.video",         "video_upscale",           None),
+    "video_sub":      ("codai.api.video",         "video_subtitle",          None),
+    "video_interp":   ("codai.api.video",         "video_interpolate",       None),
+    "video_dub":      ("codai.api.video",         "video_dub",               None),
+    "tts":            ("codai.api.tts",           "create_speech",          None),
+    "audio_gen":      ("codai.api.audio_gen",     "audio_generate",         None),
+    "voice_clone":    ("codai.api.voice_clone",   "clone_voice",            None),
+    "voice_convert":  ("codai.api.voice_convert", "convert_voice",          None),
+}
+
+# Human-readable labels for the UI
+STEP_TYPE_LABELS = {
+    "text_gen":       "Text Generation (LLM)",
+    "image_gen":      "Image Generation",
+    "image_edit":     "Image Edit (i2i)",
+    "image_inpaint":  "Image Inpaint",
+    "image_upscale":  "Image Upscale",
+    "image_deblur":   "Image Deblur",
+    "image_unpix":    "Image Unpixelate",
+    "image_outfit":   "Outfit Change",
+    "image_faceswap": "Face Swap",
+    "video_gen":      "Video Generation",
+    "video_upscale":  "Video Upscale",
+    "video_sub":      "Video Subtitles",
+    "video_interp":   "Video Interpolate",
+    "video_dub":      "Video Dub",
+    "tts":            "Text-to-Speech",
+    "audio_gen":      "Audio/Music Generation",
+    "voice_clone":    "Voice Clone (TTS)",
+    "voice_convert":  "Voice Convert (SVC)",
+}
+
+# Which params each step type accepts (for the UI form builder)
+STEP_PARAMS = {
+    "text_gen":       [("model","text","Model ID"),("prompt","textarea","Prompt"),("system","textarea","System prompt (opt)")],
+    "image_gen":      [("model","text","Model ID"),("prompt","textarea","Prompt"),("negative_prompt","text","Negative prompt"),("size","text","Size","1024x1024"),("steps","number","Steps"),("guidance_scale","number","CFG","7.5"),("seed","number","Seed")],
+    "image_edit":     [("model","text","Model ID"),("prompt","textarea","Prompt"),("image","ref","Source image ({{stepN.url}})"),("strength","number","Strength","0.75"),("steps","number","Steps"),("seed","number","Seed")],
+    "image_inpaint":  [("model","text","Model ID"),("prompt","textarea","Prompt"),("image","ref","Source image"),("mask","ref","Mask image"),("strength","number","Strength","0.99"),("steps","number","Steps"),("seed","number","Seed")],
+    "image_upscale":  [("model","text","Model ID (opt)"),("image","ref","Source image"),("scale","number","Scale","4")],
+    "image_deblur":   [("image","ref","Source image"),("strength","number","Strength","0.5")],
+    "image_unpix":    [("image","ref","Source image"),("scale","number","Scale","4")],
+    "image_outfit":   [("model","text","Inpaint model ID"),("image","ref","Source image"),("prompt","textarea","Outfit prompt"),("negative_prompt","text","Negative prompt"),("steps","number","Steps"),("seed","number","Seed")],
+    "image_faceswap": [("source_face","ref","Source face image"),("target","ref","Target image/video"),("target_type","select:image|video","Target type","image")],
+    "video_gen":      [("model","text","Model ID"),("prompt","textarea","Prompt"),("mode","select:t2v|i2v|v2v|ti2v","Mode","t2v"),("init_image","ref","Init image (i2v)"),("num_frames","number","Frames","16"),("fps","number","FPS","8"),("num_inference_steps","number","Steps","25"),("guidance_scale","number","CFG","7.5"),("seed","number","Seed")],
+    "video_upscale":  [("model","text","Model ID"),("video","ref","Source video"),("upscale_factor","number","Scale","2")],
+    "video_sub":      [("model","text","Model ID"),("video","ref","Source video"),("language","text","Language"),("burn","checkbox","Burn into video")],
+    "video_interp":   [("model","text","Model ID"),("video","ref","Source video"),("fps_multiplier","number","FPS multiplier","2")],
+    "video_dub":      [("model","text","Model ID"),("video","ref","Source video"),("target_lang","text","Target language"),("source_lang","text","Source language"),("burn_subtitles","checkbox","Burn subtitles")],
+    "tts":            [("model","text","Model ID"),("input","textarea","Text ({{stepN.output}})"),("voice","text","Voice","af_sarah"),("speed","number","Speed","1.0")],
+    "audio_gen":      [("model","text","Model ID"),("prompt","textarea","Prompt"),("duration","number","Duration (s)","10"),("temperature","number","Temperature","1.0")],
+    "voice_clone":    [("text","textarea","Text to synthesize"),("voice_name","text","Voice profile name"),("ref_text","text","Reference transcript"),("speed","number","Speed","1.0")],
+    "voice_convert":  [("source_audio","ref","Source audio"),("voice_name","text","Voice profile name"),("f0_condition","checkbox","Singing mode"),("pitch_shift","number","Pitch shift","0"),("diffusion_steps","number","Steps","10")],
+}
+
+
+def _resolve_template(value: Any, context: Dict) -> Any:
+    """Replace {{input}}, {{stepN.output}}, {{stepN.url}} etc. in string values."""
+    if not isinstance(value, str):
+        return value
+    import re
+    def _replace(m):
+        key = m.group(1).strip()
+        # {{input}} → pipeline input
+        if key == 'input':
+            return str(context.get('input', ''))
+        # {{stepN.field}}
+        match = re.match(r'step(\d+)\.(\w+)', key)
+        if match:
+            n, field = int(match.group(1)), match.group(2)
+            step_result = context.get(f'step{n}', {})
+            return str(step_result.get(field, ''))
+        return m.group(0)
+    return re.sub(r'\{\{([^}]+)\}\}', _replace, value)
+
+
+def _resolve_params(params: Dict, context: Dict) -> Dict:
+    return {k: _resolve_template(v, context) for k, v in params.items()}
+
+
+def _extract_output(step_type: str, result: Any) -> Dict:
+    """Extract useful fields from a step result for use in subsequent steps."""
+    if result is None:
+        return {}
+    r = result if isinstance(result, dict) else (result.__dict__ if hasattr(result, '__dict__') else {})
+    out = {}
+    # text_gen
+    if 'choices' in r:
+        out['output'] = r['choices'][0].get('message', {}).get('content', '') if r['choices'] else ''
+    # image/video/audio with data array
+    if 'data' in r and r['data']:
+        item = r['data'][0]
+        if isinstance(item, dict):
+            out['url'] = item.get('url', '')
+            for k, v in item.items():
+                out[k] = v
+    # tts audio field
+    if 'audio' in r:
+        out['audio'] = r['audio']
+        out['output'] = r['audio']
+    return out
+
+
+async def _run_step(step: Dict, context: Dict, http_request) -> Dict:
+    """Execute a single pipeline step and return its output context."""
+    step_type = step['type']
+    if step_type not in STEP_TYPES:
+        raise ValueError(f"Unknown step type: {step_type}")
+
+    mod_name, fn_name, req_class_path = STEP_TYPES[step_type]
+    params = _resolve_params(step.get('params', {}), context)
+
+    # Import handler
+    import importlib
+    mod = importlib.import_module(mod_name)
+    handler = getattr(mod, fn_name)
+
+    # Build request object
+    if req_class_path:
+        req_mod, req_cls = req_class_path.rsplit('.', 1)
+        req_class = getattr(importlib.import_module(req_mod), req_cls)
+        # text_gen needs messages format
+        if step_type == 'text_gen':
+            messages = [{"role": "user", "content": params.pop('prompt', '')}]
+            if 'system' in params and params['system']:
+                messages.insert(0, {"role": "system", "content": params.pop('system')})
+            else:
+                params.pop('system', None)
+            params['messages'] = messages
+            params.setdefault('stream', False)
+        req = req_class(**{k: v for k, v in params.items() if v != ''})
+    else:
+        # Find the request class from the handler's type hints
+        import inspect
+        sig = inspect.signature(handler)
+        first_param = list(sig.parameters.values())[0]
+        ann = first_param.annotation
+        if ann != inspect.Parameter.empty:
+            req = ann(**{k: v for k, v in params.items() if v != ''})
+        else:
+            req = type('Req', (), params)()
+
+    result = await handler(req, http_request)
+    return _extract_output(step_type, result)
+
+
+async def _execute_pipeline(pipeline_def: Dict, pipeline_input: str, http_request) -> Dict:
+    """Execute all steps of a pipeline definition."""
+    context = {'input': pipeline_input}
+    steps_output = []
+
+    for i, step in enumerate(pipeline_def.get('steps', [])):
+        try:
+            out = await _run_step(step, context, http_request)
+            context[f'step{i}'] = out
+            steps_output.append({'step': i, 'type': step['type'],
+                                  'label': step.get('label', step['type']), **out})
+        except Exception as e:
+            steps_output.append({'step': i, 'type': step['type'],
+                                  'label': step.get('label', step['type']),
+                                  'error': str(e)})
+            if not step.get('continue_on_error', False):
+                break
+
+    return {
+        'created': int(time.time()),
+        'pipeline': pipeline_def.get('name', pipeline_def.get('id', 'custom')),
+        'steps': steps_output,
+        'data': [context.get(f'step{len(steps_output)-1}', {})] if steps_output else [],
+    }
+
+
+# ---------------------------------------------------------------------------
+# CRUD endpoints
+# ---------------------------------------------------------------------------
+
+class PipelineStep(BaseModel):
+    type: str
+    label: Optional[str] = None
+    params: Dict[str, Any] = {}
+    continue_on_error: Optional[bool] = False
+    model_config = ConfigDict(extra='allow')
+
+
+class PipelineDefinition(BaseModel):
+    id: Optional[str] = None
+    name: str
+    description: Optional[str] = ''
+    steps: List[PipelineStep]
+    model_config = ConfigDict(extra='allow')
+
+
+class PipelineRunRequest(BaseModel):
+    input: Optional[str] = ''
+    model_config = ConfigDict(extra='allow')
+
+
+@router.get('/v1/pipelines/custom')
+async def list_custom_pipelines():
+    """List all saved custom pipeline definitions."""
+    from codai.admin.routes import config_manager
+    if config_manager is None:
+        return {'pipelines': []}
+    return {'pipelines': config_manager.pipelines_data}
+
+
+@router.get('/v1/pipelines/step-types')
+async def list_step_types():
+    """List available step types with their parameter schemas."""
+    return {
+        'step_types': [
+            {'type': t, 'label': STEP_TYPE_LABELS[t], 'params': STEP_PARAMS.get(t, [])}
+            for t in STEP_TYPES
+        ]
+    }
+
+
+@router.post('/v1/pipelines/custom')
+async def create_custom_pipeline(pipeline: PipelineDefinition):
+    """Save a new custom pipeline definition."""
+    from codai.admin.routes import config_manager
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail='Config manager not available')
+    data = pipeline.model_dump()
+    if not data.get('id'):
+        data['id'] = uuid.uuid4().hex[:8]
+    # Ensure no duplicate id
+    config_manager.pipelines_data = [p for p in config_manager.pipelines_data if p.get('id') != data['id']]
+    config_manager.pipelines_data.append(data)
+    config_manager.save_pipelines()
+    return {'created': True, 'pipeline': data}
+
+
+@router.put('/v1/pipelines/custom/{pipeline_id}')
+async def update_custom_pipeline(pipeline_id: str, pipeline: PipelineDefinition):
+    """Update an existing custom pipeline."""
+    from codai.admin.routes import config_manager
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail='Config manager not available')
+    data = pipeline.model_dump()
+    data['id'] = pipeline_id
+    existing = [p for p in config_manager.pipelines_data if p.get('id') != pipeline_id]
+    if len(existing) == len(config_manager.pipelines_data):
+        raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
+    existing.append(data)
+    config_manager.pipelines_data = existing
+    config_manager.save_pipelines()
+    return {'updated': True, 'pipeline': data}
+
+
+@router.delete('/v1/pipelines/custom/{pipeline_id}')
+async def delete_custom_pipeline(pipeline_id: str):
+    """Delete a custom pipeline."""
+    from codai.admin.routes import config_manager
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail='Config manager not available')
+    before = len(config_manager.pipelines_data)
+    config_manager.pipelines_data = [p for p in config_manager.pipelines_data if p.get('id') != pipeline_id]
+    if len(config_manager.pipelines_data) == before:
+        raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
+    config_manager.save_pipelines()
+    return {'deleted': True, 'id': pipeline_id}
+
+
+@router.post('/v1/pipelines/custom/{pipeline_id}/run')
+async def run_custom_pipeline(pipeline_id: str, body: PipelineRunRequest, http_request: Request = None):
+    """Execute a saved custom pipeline."""
+    from codai.admin.routes import config_manager
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail='Config manager not available')
+    pipeline_def = next((p for p in config_manager.pipelines_data if p.get('id') == pipeline_id), None)
+    if not pipeline_def:
+        raise HTTPException(status_code=404, detail=f"Pipeline '{pipeline_id}' not found")
+    return await _execute_pipeline(pipeline_def, body.input or '', http_request)
+
+
+@router.post('/v1/pipelines/run')
+async def run_inline_pipeline(pipeline: PipelineDefinition, http_request: Request = None):
+    """Execute an inline pipeline definition without saving it."""
+    return await _execute_pipeline(pipeline.model_dump(), '', http_request)
--- a/codai/api/faceswap.py
+++ b/codai/api/faceswap.py
+"""
+Face swap endpoint.
+
+POST /v1/images/faceswap  — swap face in image or video frames
+"""
+
+import asyncio
+import base64
+import io
+import os
+import subprocess
+import tempfile
+import time
+from typing import Optional
+
+import cv2
+import numpy as np
+from fastapi import APIRouter, HTTPException, Request
+from PIL import Image
+from pydantic import BaseModel, ConfigDict
+
+from codai.api.images import save_image_response
+
+router = APIRouter()
+
+global_args = None
+global_file_path = None
+
+_INSWAPPER_MODEL_PATH = os.path.expanduser('~/.insightface/models/inswapper_128.onnx')
+_INSWAPPER_HF_REPO = 'deepinsight/inswapper'
+_INSWAPPER_HF_FILE = 'inswapper_128.onnx'
+
+_face_app = None      # FaceAnalysis singleton
+_swapper = None       # INSwapper singleton
+
+
+def set_global_args(args):
+    global global_args
+    global_args = args
+
+
+def set_global_file_path(path):
+    global global_file_path
+    global_file_path = path
+
+
+def _ensure_model():
+    """Download inswapper_128.onnx if not present."""
+    if os.path.exists(_INSWAPPER_MODEL_PATH):
+        return
+    os.makedirs(os.path.dirname(_INSWAPPER_MODEL_PATH), exist_ok=True)
+    print(f'Downloading inswapper_128.onnx from HuggingFace…')
+    try:
+        from huggingface_hub import hf_hub_download
+        path = hf_hub_download(
+            repo_id=_INSWAPPER_HF_REPO,
+            filename=_INSWAPPER_HF_FILE,
+            local_dir=os.path.dirname(_INSWAPPER_MODEL_PATH),
+        )
+        if path != _INSWAPPER_MODEL_PATH:
+            import shutil
+            shutil.move(path, _INSWAPPER_MODEL_PATH)
+    except Exception as e:
+        raise RuntimeError(f'Failed to download inswapper model: {e}')
+
+
+def _get_face_app():
+    global _face_app
+    if _face_app is None:
+        from insightface.app import FaceAnalysis
+        _face_app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
+        _face_app.prepare(ctx_id=0, det_size=(640, 640))
+    return _face_app
+
+
+def _get_swapper():
+    global _swapper
+    if _swapper is None:
+        _ensure_model()
+        from insightface.model_zoo import get_model
+        _swapper = get_model(_INSWAPPER_MODEL_PATH, download=False)
+        _swapper.prepare(ctx_id=0)
+    return _swapper
+
+
+def _decode_image(data: str) -> np.ndarray:
+    """Decode base64 or data-URI image to BGR numpy array."""
+    if data.startswith('data:'):
+        _, b64 = data.split(',', 1)
+        data = b64
+    raw = base64.b64decode(data)
+    arr = np.frombuffer(raw, np.uint8)
+    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
+    if img is None:
+        raise ValueError('Could not decode image')
+    return img
+
+
+def _swap_faces(source_img: np.ndarray, target_img: np.ndarray) -> np.ndarray:
+    """Swap all faces in target_img with the face from source_img."""
+    app = _get_face_app()
+    swapper = _get_swapper()
+
+    src_faces = app.get(source_img)
+    if not src_faces:
+        raise ValueError('No face detected in source image')
+    src_face = src_faces[0]
+
+    tgt_faces = app.get(target_img)
+    if not tgt_faces:
+        return target_img  # no face to swap in target, return as-is
+
+    result = target_img.copy()
+    for tgt_face in tgt_faces:
+        result = swapper.get(result, tgt_face, src_face, paste_back=True)
+    return result
+
+
+def _decode_b64_or_url(data: str) -> bytes:
+    if data.startswith('data:'):
+        _, b64 = data.split(',', 1)
+        return base64.b64decode(b64)
+    if data.startswith('http'):
+        import urllib.request
+        with urllib.request.urlopen(data, timeout=30) as r:
+            return r.read()
+    return base64.b64decode(data)
+
+
+# ---------------------------------------------------------------------------
+# Request model
+# ---------------------------------------------------------------------------
+
+class FaceSwapRequest(BaseModel):
+    source_face: str            # base64/data-URI image containing the source face
+    target: str                 # base64/data-URI image OR video to swap into
+    target_type: Optional[str] = 'image'   # 'image' or 'video'
+    response_format: Optional[str] = 'url'
+    model_config = ConfigDict(extra='allow')
+
+
+# ---------------------------------------------------------------------------
+# Endpoint
+# ---------------------------------------------------------------------------
+
+@router.post('/v1/images/faceswap')
+async def faceswap(request: FaceSwapRequest, http_request: Request = None):
+    """
+    Swap the face from source_face into every face found in target.
+    target_type: 'image' (default) or 'video'.
+    """
+    try:
+        _ensure_model()
+    except RuntimeError as e:
+        raise HTTPException(status_code=503, detail=str(e))
+
+    try:
+        src_img = _decode_image(request.source_face)
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f'Invalid source_face: {e}')
+
+    if request.target_type == 'video':
+        return await _faceswap_video(src_img, request, http_request)
+    else:
+        return await _faceswap_image(src_img, request, http_request)
+
+
+async def _faceswap_image(src_img, request, http_request):
+    try:
+        tgt_img = _decode_image(request.target)
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f'Invalid target: {e}')
+
+    try:
+        result = await asyncio.get_event_loop().run_in_executor(
+            None, _swap_faces, src_img, tgt_img)
+    except ValueError as e:
+        raise HTTPException(status_code=422, detail=str(e))
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f'Face swap failed: {e}')
+
+    pil_img = Image.fromarray(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
+    img_data = save_image_response(pil_img, request.response_format, http_request)
+    return {'created': int(time.time()), 'data': [img_data]}
+
+
+async def _faceswap_video(src_img, request, http_request):
+    raw = _decode_b64_or_url(request.target)
+    temps = []
+    try:
+        # Write input video
+        in_tmp = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
+        in_tmp.write(raw); in_tmp.close()
+        in_path = in_tmp.name
+        temps.append(in_path)
+
+        # Extract frames
+        frames_dir = tempfile.mkdtemp()
+        temps.append(frames_dir)
+        subprocess.run(
+            ['ffmpeg', '-y', '-i', in_path, f'{frames_dir}/%08d.png'],
+            capture_output=True, check=True)
+
+        # Get FPS for reassembly
+        probe = subprocess.run(
+            ['ffprobe', '-v', 'error', '-select_streams', 'v:0',
+             '-show_entries', 'stream=r_frame_rate', '-of', 'default=nw=1:nk=1', in_path],
+            capture_output=True, text=True)
+        fps_str = probe.stdout.strip() or '25/1'
+        num, den = fps_str.split('/')
+        fps = float(num) / float(den)
+
+        # Swap faces in each frame
+        frame_files = sorted(os.listdir(frames_dir))
+
+        def _process_frames():
+            app = _get_face_app()
+            swapper = _get_swapper()
+            src_faces = app.get(src_img)
+            if not src_faces:
+                raise ValueError('No face detected in source image')
+            src_face = src_faces[0]
+            for fname in frame_files:
+                fpath = os.path.join(frames_dir, fname)
+                frame = cv2.imread(fpath)
+                if frame is None:
+                    continue
+                tgt_faces = app.get(frame)
+                for tgt_face in tgt_faces:
+                    frame = swapper.get(frame, tgt_face, src_face, paste_back=True)
+                cv2.imwrite(fpath, frame)
+
+        await asyncio.get_event_loop().run_in_executor(None, _process_frames)
+
+        # Reassemble video (copy original audio)
+        out_path = tempfile.mktemp(suffix='_swapped.mp4')
+        temps.append(out_path)
+        subprocess.run(
+            ['ffmpeg', '-y', '-framerate', str(fps), '-i', f'{frames_dir}/%08d.png',
+             '-i', in_path, '-map', '0:v', '-map', '1:a?',
+             '-c:v', 'libx264', '-c:a', 'copy', '-shortest', out_path],
+            capture_output=True, check=True)
+
+        with open(out_path, 'rb') as f:
+            out_bytes = f.read()
+
+        if global_file_path:
+            import uuid
+            fname = f'{uuid.uuid4().hex}_swapped.mp4'
+            fpath = os.path.join(global_file_path, fname)
+            os.makedirs(global_file_path, exist_ok=True)
+            with open(fpath, 'wb') as f:
+                f.write(out_bytes)
+            host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
+            if ':' in host:
+                parts = host.split(':')
+                if len(parts) == 2 and parts[1].isdigit():
+                    host = parts[0]
+            proto = 'https' if getattr(global_args, 'https', False) else 'http'
+            port = getattr(global_args, 'port', 8000) if global_args else 8000
+            data = [{'url': f'{proto}://{host}:{port}/v1/files/{fname}'}]
+        else:
+            data = [{'b64_mp4': base64.b64encode(out_bytes).decode()}]
+
+        return {'created': int(time.time()), 'data': data}
+
+    except subprocess.CalledProcessError as e:
+        raise HTTPException(status_code=500, detail=f'ffmpeg error: {e.stderr.decode()[:200]}')
+    except ValueError as e:
+        raise HTTPException(status_code=422, detail=str(e))
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f'Video face swap failed: {e}')
+    finally:
+        import shutil
+        for t in temps:
+            try:
+                if os.path.isdir(t):
+                    shutil.rmtree(t)
+                else:
+                    os.unlink(t)
+            except Exception:
+                pass
--- a/codai/api/images.py
+++ b/codai/api/images.py
@@ -21,14 +21,17 @@ Image generation endpoints for the codai API.
 import asyncio
 import base64
 import io
+import logging
 import os
 import time
 import uuid
 from typing import Optional

 from fastapi import APIRouter, HTTPException, Request
+
+_log = logging.getLogger(__name__)
 from PIL import Image
-from pydantic import BaseModel
+from pydantic import BaseModel, ConfigDict

 # Import from codai modules
 from codai.models.manager import multi_model_manager
@@ -78,14 +81,12 @@ def get_cfg_scale():
                        for heap in mem:
                            if heap.get('flags', []).get('deviceLocal', False):
                                vram_mb = heap.get('size', 0) / (1024 * 1024)
-                                print(f"DEBUG: Detected VRAM: {vram_mb:.0f} MB")
+                                _log.debug("Detected VRAM: %.0f MB", vram_mb)
                                if vram_mb < 16000:  # Less than 16GB
-                                    print(f"DEBUG: VRAM < 16GB, using cfg_scale=1.0 for better performance")
                                    return 1.0
                                break
            except Exception as e:
-                print(f"DEBUG: Could not detect VRAM: {e}")
-                # Default to 1.0 for Vulkan if detection fails
+                _log.debug("Could not detect VRAM: %s", e)
                return 1.0
    
    return cfg_scale
@@ -117,7 +118,6 @@ def save_image_response(img, request_format="base64", http_request=None):
        # Add URL to response
        # Determine base URL based on --url argument
        url_setting = getattr(global_args, 'url', 'auto') if global_args else 'auto'
-        print(f"DEBUG: global_args={global_args}, url_setting={url_setting}")
        if url_setting == 'auto':
            # Use server host from request headers (what client used to connect)
            if http_request:
@@ -146,7 +146,6 @@ def save_image_response(img, request_format="base64", http_request=None):
                protocol = "https" if use_https else "http"
                port = getattr(global_args, 'port', 8000)
                base_url = f"{protocol}://{client_host}:{port}"
-                print(f"DEBUG: client_host={client_host}, port={port}, base_url={base_url}")
            else:
                base_url = "http://127.0.0.1:8000"
        else:
@@ -460,13 +459,9 @@ def _generate_with_diffusers(pipeline, request, global_args, http_request=None):
            raise Exception(f"Could not extract images from diffusers result: {img_err}")
    
    for img in result_images:
-        # Debug: print image type and value range
-        print(f"DEBUG: Image type: {type(img)}")
        if isinstance(img, np.ndarray):
-            print(f"DEBUG: Image shape: {img.shape}, dtype: {img.dtype}, min: {img.min()}, max: {img.max()}")
            img = np.nan_to_num(img, nan=0.0, posinf=1.0, neginf=0.0)
            img = np.clip(img, 0.0, 1.0)
-            print(f"DEBUG: After NaN handling - min: {img.min()}, max: {img.max()}")
        
        img_data = save_image_response(img, request.response_format, http_request)
        images.append(img_data)
@@ -532,16 +527,27 @@ def _load_sdcpp_model(model_path: str, global_args, model_config: dict = None):
    Returns the loaded StableDiffusion model or None.
    """
    from stable_diffusion_cpp import StableDiffusion
+    import stable_diffusion_cpp.stable_diffusion_cpp as sd_cpp
+    import ctypes

    # Check for --no-ram mode
    no_ram = getattr(global_args, 'no_ram', False) if global_args else False

    print(f"Loading sd.cpp model from: {model_path}")

+    # Intercept sd.cpp log to detect partial-init failures (e.g. unknown SD version)
+    log_lines = []
+    @sd_cpp.sd_log_callback
+    def _log_cb(level, text, data):
+        if text:
+            line = text.decode('utf-8', errors='replace').rstrip()
+            log_lines.append(line)
+    sd_cpp.sd_set_log_callback(_log_cb, None)
+
    # Build sd.cpp constructor args from config
    kwargs = {
        'model_path': model_path,
-        'offload_params_to_cpu': False,  # Use GPU by default
+        'offload_params_to_cpu': False,
        'keep_clip_on_cpu': False,
        'keep_control_net_on_cpu': False,
        'keep_vae_on_cpu': False,
@@ -575,6 +581,21 @@ def _load_sdcpp_model(model_path: str, global_args, model_config: dict = None):
            sd_model = StableDiffusion(**kwargs)
        else:
            raise
+    finally:
+        # Restore default log callback
+        sd_cpp.sd_set_log_callback(None, None)
+
+    # Check if sd.cpp failed to identify the model architecture.
+    # In this case new_sd_ctx returns a non-null but broken context that
+    # will segfault on generate_image — reject it early.
+    failed_version = any('get sd version from file failed' in l for l in log_lines)
+    if failed_version:
+        raise ValueError(
+            f"sd.cpp could not identify the model architecture in '{model_path}'. "
+            "This model may require a newer version of stable-diffusion-cpp-python, "
+            "or it may not be a supported Stable Diffusion GGUF format."
+        )
+
    return sd_model


@@ -1278,3 +1299,344 @@ async def create_image_segment(request: ImageSegmentRequest, http_request: Reque
        raise HTTPException(status_code=500, detail=f"Segmentation failed: {e}")
    result = save_image_response(seg_img, request.response_format, http_request)
    return {"created": int(time.time()), "data": [result]}
+
+
+# =============================================================================
+# Deblur Endpoint  (POST /v1/images/deblur)
+# =============================================================================
+
+class ImageDeblurRequest(BaseModel):
+    image: str                              # base64 input image
+    strength: Optional[float] = 0.5        # 0–1, deblur aggressiveness
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+def _run_deblur(image_bytes: bytes, strength: float) -> "PILImage.Image":
+    """Blind deblur using Wiener deconvolution + sharpening."""
+    import numpy as np
+    import cv2
+    from scipy.signal import wiener
+    from PIL import Image as PILImage
+
+    img = PILImage.open(io.BytesIO(image_bytes)).convert("RGB")
+    arr = np.array(img, dtype=np.float32) / 255.0
+
+    # Wiener filter per channel
+    noise_power = max(0.001, (1.0 - strength) * 0.05)
+    deblurred = np.stack([
+        wiener(arr[:, :, c], mysize=5, noise=noise_power)
+        for c in range(3)
+    ], axis=2)
+    deblurred = np.clip(deblurred, 0.0, 1.0)
+
+    # Unsharp mask pass for edge recovery
+    blur_sigma = max(0.5, (1.0 - strength) * 2.0)
+    blurred = cv2.GaussianBlur(deblurred, (0, 0), blur_sigma)
+    sharpened = cv2.addWeighted(deblurred, 1.0 + strength, blurred, -strength, 0)
+    sharpened = np.clip(sharpened, 0.0, 1.0)
+
+    return PILImage.fromarray((sharpened * 255).astype(np.uint8))
+
+
+@router.post("/v1/images/deblur")
+async def create_image_deblur(request: ImageDeblurRequest, http_request: Request = None):
+    """Remove blur from an image using Wiener deconvolution and unsharp masking."""
+    raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
+    try:
+        result_img = await asyncio.get_event_loop().run_in_executor(
+            None, _run_deblur, raw, request.strength or 0.5)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Deblur failed: {e}")
+    result = save_image_response(result_img, request.response_format, http_request)
+    return {"created": int(time.time()), "data": [result]}
+
+
+# =============================================================================
+# Unpixelate Endpoint  (POST /v1/images/unpixelate)
+# Uses Real-ESRGAN super-resolution — designed exactly for this use case.
+# =============================================================================
+
+class ImageUnpixelateRequest(BaseModel):
+    image: str
+    scale: Optional[int] = 4               # 2, 4, or 8
+    model: Optional[str] = None            # optional custom Real-ESRGAN model path
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+def _run_unpixelate(image_bytes: bytes, scale: int, model_path: Optional[str]) -> "PILImage.Image":
+    import numpy as np
+    from basicsr.archs.rrdbnet_arch import RRDBNet
+    from realesrgan import RealESRGANer
+    import torch
+    from PIL import Image as PILImage
+
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+    if model_path and os.path.exists(model_path):
+        mp = model_path
+    else:
+        # Download RealESRGAN_x4plus on demand
+        mp = os.path.expanduser('~/.cache/realesrgan/RealESRGAN_x4plus.pth')
+        if not os.path.exists(mp):
+            os.makedirs(os.path.dirname(mp), exist_ok=True)
+            import urllib.request
+            url = 'https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth'
+            print(f'Downloading RealESRGAN_x4plus.pth…')
+            urllib.request.urlretrieve(url, mp)
+
+    model_obj = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64,
+                        num_block=23, num_grow_ch=32, scale=4)
+    upsampler = RealESRGANer(scale=4, model_path=mp, model=model_obj,
+                              half=device.type == 'cuda', device=device)
+
+    img = PILImage.open(io.BytesIO(image_bytes)).convert("RGB")
+    out_arr, _ = upsampler.enhance(np.array(img), outscale=scale)
+    return PILImage.fromarray(out_arr)
+
+
+@router.post("/v1/images/unpixelate")
+async def create_image_unpixelate(request: ImageUnpixelateRequest, http_request: Request = None):
+    """Remove pixelation / upscale with detail recovery using Real-ESRGAN."""
+    raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
+    try:
+        result_img = await asyncio.get_event_loop().run_in_executor(
+            None, _run_unpixelate, raw, request.scale or 4, request.model)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Unpixelate failed: {e}")
+    result = save_image_response(result_img, request.response_format, http_request)
+    return {"created": int(time.time()), "data": [result]}
+
+
+# =============================================================================
+# Outfit Change Endpoint  (POST /v1/images/outfit)
+# Auto-generates a clothing mask via person segmentation, then inpaints.
+# =============================================================================
+
+class ImageOutfitRequest(BaseModel):
+    model: str                              # inpaint model id
+    image: Optional[str] = None            # base64 source image (image mode)
+    video: Optional[str] = None            # base64 source video (video mode)
+    prompt: str                             # description of the new outfit
+    negative_prompt: Optional[str] = None
+    mask: Optional[str] = None             # optional manual mask (base64); auto-generated if absent
+    steps: Optional[int] = 30
+    guidance_scale: Optional[float] = 7.5
+    strength: Optional[float] = 0.99
+    seed: Optional[int] = None
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+def _generate_clothing_mask(img_arr) -> "np.ndarray":
+    """
+    Generate a rough clothing mask using GrabCut person segmentation.
+    Returns a binary mask (255 = clothing area to replace).
+    """
+    import numpy as np
+    import cv2
+    h, w = img_arr.shape[:2]
+    bgr = cv2.cvtColor(img_arr, cv2.COLOR_RGB2BGR)
+
+    # GrabCut with a central rect (assumes person is roughly centered)
+    mask_gc = np.zeros((h, w), np.uint8)
+    bgd = np.zeros((1, 65), np.float64)
+    fgd = np.zeros((1, 65), np.float64)
+    margin_x, margin_y = w // 8, h // 8
+    rect = (margin_x, margin_y, w - 2 * margin_x, h - 2 * margin_y)
+    cv2.grabCut(bgr, mask_gc, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)
+    fg_mask = np.where((mask_gc == cv2.GC_FGD) | (mask_gc == cv2.GC_PR_FGD), 255, 0).astype(np.uint8)
+
+    # Exclude top 25% (head/hair) and bottom 10% (feet)
+    fg_mask[:h // 4, :] = 0
+    fg_mask[int(h * 0.9):, :] = 0
+
+    # Dilate slightly so inpaint covers clothing edges
+    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15))
+    fg_mask = cv2.dilate(fg_mask, kernel, iterations=2)
+    return fg_mask
+
+
+@router.post("/v1/images/outfit")
+async def create_image_outfit(request: ImageOutfitRequest, http_request: Request = None):
+    """Change the outfit/clothing in an image or video using inpainting."""
+    global global_args
+
+    if request.video:
+        return await _outfit_video(request, http_request)
+
+    raw = base64.b64decode(request.image.split(',', 1)[-1] if ',' in request.image else request.image)
+    from PIL import Image as PILImage
+    import numpy as np
+    img = PILImage.open(io.BytesIO(raw)).convert("RGB")
+    img_arr = np.array(img)
+
+    # Generate or decode mask
+    if request.mask:
+        mask_raw = base64.b64decode(request.mask.split(',', 1)[-1] if ',' in request.mask else request.mask)
+        mask_img = PILImage.open(io.BytesIO(mask_raw)).convert("L")
+    else:
+        try:
+            mask_arr = await asyncio.get_event_loop().run_in_executor(
+                None, _generate_clothing_mask, img_arr)
+            mask_img = PILImage.fromarray(mask_arr)
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Mask generation failed: {e}")
+
+    # Load inpaint pipeline
+    model_key = f"inpaint:{request.model}"
+    pipeline = multi_model_manager.models.get(model_key)
+    if pipeline is None:
+        try:
+            pipeline = await asyncio.get_event_loop().run_in_executor(
+                None, _load_inpaint_pipeline, request.model, global_args)
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Failed to load inpaint model: {e}")
+        multi_model_manager.models[model_key] = pipeline
+
+    # Run inpaint
+    import torch
+    generator = torch.Generator().manual_seed(request.seed) if request.seed is not None else None
+
+    def _run():
+        kwargs = dict(
+            prompt=request.prompt,
+            image=img,
+            mask_image=mask_img,
+            num_inference_steps=request.steps or 30,
+            guidance_scale=request.guidance_scale or 7.5,
+            strength=request.strength or 0.99,
+        )
+        if request.negative_prompt:
+            kwargs['negative_prompt'] = request.negative_prompt
+        if generator:
+            kwargs['generator'] = generator
+        if hasattr(pipeline, 'safety_checker'):
+            pipeline.safety_checker = None
+        return pipeline(**kwargs).images[0]
+
+    try:
+        result_img = await asyncio.get_event_loop().run_in_executor(None, _run)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Outfit change failed: {e}")
+
+    result = save_image_response(result_img, request.response_format, http_request)
+    return {"created": int(time.time()), "data": [result]}
+
+
+async def _outfit_video(request: ImageOutfitRequest, http_request):
+    """Process outfit change frame-by-frame on a video."""
+    import subprocess
+    import tempfile
+    import shutil
+
+    raw = base64.b64decode(request.video.split(',', 1)[-1] if ',' in request.video else request.video)
+    temps = []
+    try:
+        in_path = tempfile.mktemp(suffix='.mp4')
+        temps.append(in_path)
+        with open(in_path, 'wb') as f:
+            f.write(raw)
+
+        frames_dir = tempfile.mkdtemp()
+        temps.append(frames_dir)
+        subprocess.run(['ffmpeg', '-y', '-i', in_path, f'{frames_dir}/%08d.png'],
+                       capture_output=True, check=True)
+
+        probe = subprocess.run(
+            ['ffprobe', '-v', 'error', '-select_streams', 'v:0',
+             '-show_entries', 'stream=r_frame_rate', '-of', 'default=nw=1:nk=1', in_path],
+            capture_output=True, text=True)
+        fps_str = probe.stdout.strip() or '25/1'
+        num, den = fps_str.split('/')
+        fps = float(num) / float(den)
+
+        # Load pipeline once
+        model_key = f"inpaint:{request.model}"
+        pipeline = multi_model_manager.models.get(model_key)
+        if pipeline is None:
+            pipeline = await asyncio.get_event_loop().run_in_executor(
+                None, _load_inpaint_pipeline, request.model, global_args)
+            multi_model_manager.models[model_key] = pipeline
+
+        import torch
+        from PIL import Image as PILImage
+        import numpy as np
+        import cv2
+
+        generator = torch.Generator().manual_seed(request.seed) if request.seed is not None else None
+
+        def _process_frames():
+            for fname in sorted(os.listdir(frames_dir)):
+                fpath = os.path.join(frames_dir, fname)
+                img = PILImage.open(fpath).convert("RGB")
+                img_arr = np.array(img)
+                if request.mask:
+                    mask_raw = base64.b64decode(request.mask.split(',', 1)[-1] if ',' in request.mask else request.mask)
+                    mask_img = PILImage.open(io.BytesIO(mask_raw)).convert("L")
+                else:
+                    mask_arr = _generate_clothing_mask(img_arr)
+                    mask_img = PILImage.fromarray(mask_arr)
+                kwargs = dict(
+                    prompt=request.prompt,
+                    image=img,
+                    mask_image=mask_img,
+                    num_inference_steps=request.steps or 30,
+                    guidance_scale=request.guidance_scale or 7.5,
+                    strength=request.strength or 0.99,
+                )
+                if request.negative_prompt:
+                    kwargs['negative_prompt'] = request.negative_prompt
+                if generator:
+                    kwargs['generator'] = generator
+                if hasattr(pipeline, 'safety_checker'):
+                    pipeline.safety_checker = None
+                result = pipeline(**kwargs).images[0]
+                result.save(fpath)
+
+        await asyncio.get_event_loop().run_in_executor(None, _process_frames)
+
+        out_path = tempfile.mktemp(suffix='_outfit.mp4')
+        temps.append(out_path)
+        subprocess.run(
+            ['ffmpeg', '-y', '-framerate', str(fps), '-i', f'{frames_dir}/%08d.png',
+             '-i', in_path, '-map', '0:v', '-map', '1:a?',
+             '-c:v', 'libx264', '-c:a', 'copy', '-shortest', out_path],
+            capture_output=True, check=True)
+
+        with open(out_path, 'rb') as f:
+            out_bytes = f.read()
+
+        if global_file_path:
+            fname = f'{uuid.uuid4().hex}_outfit.mp4'
+            fpath_out = os.path.join(global_file_path, fname)
+            os.makedirs(global_file_path, exist_ok=True)
+            with open(fpath_out, 'wb') as f:
+                f.write(out_bytes)
+            host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
+            if ':' in host:
+                parts = host.split(':')
+                if len(parts) == 2 and parts[1].isdigit():
+                    host = parts[0]
+            proto = 'https' if getattr(global_args, 'https', False) else 'http'
+            port = getattr(global_args, 'port', 8000) if global_args else 8000
+            data = [{'url': f'{proto}://{host}:{port}/v1/files/{fname}'}]
+        else:
+            data = [{'b64_mp4': base64.b64encode(out_bytes).decode()}]
+
+        return {'created': int(time.time()), 'data': data}
+
+    except subprocess.CalledProcessError as e:
+        raise HTTPException(status_code=500, detail=f'ffmpeg error: {e.stderr.decode()[:200]}')
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f'Video outfit change failed: {e}')
+    finally:
+        for t in temps:
+            try:
+                if os.path.isdir(t):
+                    shutil.rmtree(t)
+                else:
+                    os.unlink(t)
+            except Exception:
+                pass
--- a/codai/api/pipelines.py
+++ b/codai/api/pipelines.py
+"""
+Server-side pipeline endpoints — multi-step generation chains.
+
+POST /v1/pipelines/image-to-video   — generate image then animate it
+POST /v1/pipelines/story            — LLM script → images → video → TTS narration
+POST /v1/pipelines/video-dub        — transcribe → translate → TTS dub → burn subtitles
+POST /v1/pipelines/audio-dub        — transcribe audio/video → translate → clone voice → replace audio
+"""
+
+import asyncio
+import time
+from typing import List, Optional
+
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel, ConfigDict
+
+router = APIRouter()
+
+# ---------------------------------------------------------------------------
+# Helpers — thin wrappers that call the existing endpoint logic directly
+# ---------------------------------------------------------------------------
+
+async def _post_json(path: str, body: dict, http_request: Request):
+    """Call an internal endpoint by importing its handler directly."""
+    from codai.api import app as _app_module
+    from fastapi.testclient import TestClient  # only for internal calls
+    # We avoid HTTP round-trips by calling handlers directly via their routers.
+    # Import lazily to avoid circular imports.
+    if path.startswith('/v1/images/generations'):
+        from codai.api.images import create_image_generation
+        from codai.pydantic.imagerequest import ImageGenerationRequest
+        req = ImageGenerationRequest(**body)
+        return await create_image_generation(req, http_request)
+
+    if path.startswith('/v1/video/generations'):
+        from codai.api.video import create_video_generation
+        from codai.pydantic.videorequest import VideoGenerationRequest
+        req = VideoGenerationRequest(**body)
+        return await create_video_generation(req, http_request)
+
+    if path.startswith('/v1/video/dub'):
+        from codai.api.video import create_video_dub
+        from codai.pydantic.videorequest import VideoDubRequest
+        req = VideoDubRequest(**body)
+        return await create_video_dub(req, http_request)
+
+    if path.startswith('/v1/audio/speech'):
+        from codai.api.tts import create_speech, TTSRequest
+        req = TTSRequest(**body)
+        return await create_speech(req)
+
+    if path.startswith('/v1/chat/completions'):
+        from codai.api.text import chat_completions
+        from codai.pydantic.textrequest import ChatCompletionRequest
+        req = ChatCompletionRequest(**body)
+        return await chat_completions(req, http_request)
+
+    raise ValueError(f"Unknown internal path: {path}")
+
+
+def _img_url(result) -> str:
+    """Extract URL from an image generation result dict."""
+    data = result.get('data', [{}])
+    item = data[0] if data else {}
+    return item.get('url') or ('data:image/png;base64,' + item['b64_json'] if item.get('b64_json') else None)
+
+
+def _vid_url(result) -> str:
+    data = result.get('data', [{}])
+    item = data[0] if data else {}
+    return item.get('url') or ('data:video/mp4;base64,' + item['b64_mp4'] if item.get('b64_mp4') else None)
+
+
+def _aud_url(result) -> str:
+    if isinstance(result, dict):
+        if result.get('audio'):
+            return 'data:audio/mp3;base64,' + result['audio']
+        data = result.get('data', [{}])
+        item = data[0] if data else {}
+        if item.get('url'):
+            return item['url']
+        for k, v in item.items():
+            if k.startswith('b64_'):
+                return f'data:audio/{k[4:]};base64,{v}'
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Pipeline 1: Image → Video
+# ---------------------------------------------------------------------------
+
+class ImageToVideoPipelineRequest(BaseModel):
+    prompt: str
+    image_model: str
+    video_model: str
+    # image params
+    image_size: Optional[str] = "1024x1024"
+    image_steps: Optional[int] = None
+    image_cfg: Optional[float] = None
+    image_seed: Optional[int] = None
+    negative_prompt: Optional[str] = None
+    # video params
+    num_frames: Optional[int] = 16
+    fps: Optional[int] = 8
+    num_inference_steps: Optional[int] = 25
+    guidance_scale: Optional[float] = 7.5
+    video_seed: Optional[int] = None
+    camera_motion: Optional[str] = None
+    # audio
+    add_audio: Optional[bool] = False
+    audio_type: Optional[str] = None
+    audio_prompt: Optional[str] = None
+    # post
+    upscale_output: Optional[bool] = False
+    upscale_factor: Optional[int] = 2
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+@router.post("/v1/pipelines/image-to-video")
+async def pipeline_image_to_video(request: ImageToVideoPipelineRequest, http_request: Request = None):
+    """Generate an image then animate it into a video."""
+    steps = []
+
+    # Step 1: generate image
+    img_body = {
+        "model": request.image_model,
+        "prompt": request.prompt,
+        "size": request.image_size,
+        "response_format": "url",
+    }
+    if request.image_steps:   img_body["steps"] = request.image_steps
+    if request.image_cfg:     img_body["guidance_scale"] = request.image_cfg
+    if request.image_seed:    img_body["seed"] = request.image_seed
+    if request.negative_prompt: img_body["negative_prompt"] = request.negative_prompt
+
+    try:
+        img_result = await _post_json('/v1/images/generations', img_body, http_request)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Image generation failed: {e}")
+
+    img_url = _img_url(img_result if isinstance(img_result, dict) else img_result.__dict__)
+    if not img_url:
+        raise HTTPException(status_code=500, detail="Image generation returned no image")
+    steps.append({"step": "image", "url": img_url})
+
+    # Step 2: animate image → video
+    vid_body = {
+        "model": request.video_model,
+        "mode": "i2v",
+        "prompt": request.prompt,
+        "init_image": img_url,
+        "num_frames": request.num_frames,
+        "fps": request.fps,
+        "num_inference_steps": request.num_inference_steps,
+        "guidance_scale": request.guidance_scale,
+        "response_format": "url",
+    }
+    if request.video_seed:    vid_body["seed"] = request.video_seed
+    if request.camera_motion: vid_body["camera_motion"] = request.camera_motion
+    if request.add_audio:
+        vid_body["add_audio"] = True
+        vid_body["audio_type"] = request.audio_type
+        vid_body["audio_prompt"] = request.audio_prompt
+    if request.upscale_output:
+        vid_body["upscale_output"] = True
+        vid_body["upscale_factor"] = request.upscale_factor
+
+    try:
+        vid_result = await _post_json('/v1/video/generations', vid_body, http_request)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Video generation failed: {e}")
+
+    vid_url = _vid_url(vid_result if isinstance(vid_result, dict) else vid_result.__dict__)
+    steps.append({"step": "video", "url": vid_url})
+
+    return {
+        "created": int(time.time()),
+        "pipeline": "image-to-video",
+        "steps": steps,
+        "data": [{"url": vid_url, "image_url": img_url}],
+    }
+
+
+# ---------------------------------------------------------------------------
+# Pipeline 2: Video Dub
+# ---------------------------------------------------------------------------
+
+class VideoDubPipelineRequest(BaseModel):
+    model: str
+    video: str                          # base64 or URL
+    target_lang: str
+    source_lang: Optional[str] = None
+    voice_clone: Optional[bool] = False
+    burn_subtitles: Optional[bool] = True
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+@router.post("/v1/pipelines/video-dub")
+async def pipeline_video_dub(request: VideoDubPipelineRequest, http_request: Request = None):
+    """Transcribe → translate → TTS dub → burn subtitles."""
+    body = {
+        "model": request.model,
+        "video": request.video,
+        "target_lang": request.target_lang,
+        "source_lang": request.source_lang,
+        "voice_clone": request.voice_clone,
+        "burn_subtitles": request.burn_subtitles,
+        "response_format": request.response_format,
+    }
+    try:
+        result = await _post_json('/v1/video/dub', body, http_request)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Video dub failed: {e}")
+
+    vid_url = _vid_url(result if isinstance(result, dict) else result.__dict__)
+    return {
+        "created": int(time.time()),
+        "pipeline": "video-dub",
+        "data": [{"url": vid_url}],
+    }
+
+
+# ---------------------------------------------------------------------------
+# Pipeline 3: Full Story (LLM → images → video → TTS narration)
+# ---------------------------------------------------------------------------
+
+class StoryPipelineRequest(BaseModel):
+    story: str
+    text_model: str
+    image_model: str
+    video_model: str
+    tts_model: Optional[str] = None
+    tts_voice: Optional[str] = "af_sarah"
+    num_scenes: Optional[int] = 3
+    num_frames: Optional[int] = 16
+    fps: Optional[int] = 8
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+@router.post("/v1/pipelines/story")
+async def pipeline_story(request: StoryPipelineRequest, http_request: Request = None):
+    """LLM generates script → image per scene → animate first scene → optional TTS narration."""
+    n = min(request.num_scenes or 3, 6)
+
+    # Step 1: LLM script
+    try:
+        script_result = await _post_json('/v1/chat/completions', {
+            "model": request.text_model,
+            "messages": [{"role": "user", "content":
+                f"Write a {n}-scene visual script for this story. "
+                f"For each scene write exactly: SCENE X: [brief visual description, one sentence]. "
+                f"Story: {request.story}"}],
+            "stream": False,
+        }, http_request)
+        if hasattr(script_result, 'body'):
+            import json
+            script_result = json.loads(script_result.body)
+        script_text = script_result['choices'][0]['message']['content']
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Script generation failed: {e}")
+
+    import re
+    scenes = re.findall(r'SCENE \d+:\s*(.+)', script_text) or [request.story]
+    scenes = scenes[:n]
+
+    steps = [{"step": "script", "text": script_text, "scenes": scenes}]
+
+    # Step 2: image per scene (parallel)
+    async def _gen_image(desc):
+        try:
+            r = await _post_json('/v1/images/generations', {
+                "model": request.image_model,
+                "prompt": desc,
+                "response_format": "url",
+            }, http_request)
+            return _img_url(r if isinstance(r, dict) else r.__dict__)
+        except Exception:
+            return None
+
+    img_urls = await asyncio.gather(*[_gen_image(s) for s in scenes])
+    img_urls = [u for u in img_urls if u]
+    steps.append({"step": "images", "urls": img_urls})
+
+    if not img_urls:
+        raise HTTPException(status_code=500, detail="All image generations failed")
+
+    # Step 3: animate first scene
+    try:
+        vid_result = await _post_json('/v1/video/generations', {
+            "model": request.video_model,
+            "mode": "i2v",
+            "prompt": scenes[0],
+            "init_image": img_urls[0],
+            "num_frames": request.num_frames,
+            "fps": request.fps,
+            "response_format": "url",
+        }, http_request)
+        vid_url = _vid_url(vid_result if isinstance(vid_result, dict) else vid_result.__dict__)
+    except Exception as e:
+        vid_url = None
+        steps.append({"step": "video", "error": str(e)})
+    else:
+        steps.append({"step": "video", "url": vid_url})
+
+    # Step 4: TTS narration (optional)
+    aud_url = None
+    if request.tts_model:
+        narration = " ".join(scenes)
+        try:
+            aud_result = await _post_json('/v1/audio/speech', {
+                "model": request.tts_model,
+                "input": narration,
+                "voice": request.tts_voice or "af_sarah",
+                "response_format": "mp3",
+            }, http_request)
+            aud_url = _aud_url(aud_result if isinstance(aud_result, dict) else aud_result.__dict__)
+        except Exception as e:
+            steps.append({"step": "tts", "error": str(e)})
+        else:
+            steps.append({"step": "tts", "url": aud_url})
+
+    return {
+        "created": int(time.time()),
+        "pipeline": "story",
+        "steps": steps,
+        "data": [{
+            "video_url": vid_url,
+            "image_urls": img_urls,
+            "audio_url": aud_url,
+        }],
+    }
+
+
+# ---------------------------------------------------------------------------
+# Pipeline 4: Audio Dub (transcribe → translate → clone voice → replace audio)
+# ---------------------------------------------------------------------------
+
+class AudioDubPipelineRequest(BaseModel):
+    """
+    Dub an audio or video file using a cloned voice.
+
+    Steps:
+    1. Transcribe source audio/video with Whisper
+    2. Optionally translate the transcript
+    3. Synthesize dubbed audio with F5-TTS voice cloning
+    4. If input is video: replace the audio track (ffmpeg)
+       If input is audio: return the dubbed audio directly
+    """
+    # Input — provide one of:
+    video: Optional[str] = None         # base64/URL video
+    audio: Optional[str] = None         # base64/URL audio-only file
+
+    # Voice cloning — provide one of:
+    voice_name: Optional[str] = None    # saved voice profile name
+    ref_audio: Optional[str] = None     # base64 reference audio
+    ref_text: Optional[str] = None      # transcript of ref_audio
+
+    # Transcription
+    source_lang: Optional[str] = None   # source language hint (auto-detect if None)
+    whisper_model: Optional[str] = None # whisper model size (base, small, medium, large)
+
+    # Translation
+    target_lang: Optional[str] = None   # translate to this language before dubbing
+                                        # if None, dub in original language
+
+    # TTS
+    speed: Optional[float] = 1.0
+    seed: Optional[int] = None
+
+    # Video output options
+    burn_subtitles: Optional[bool] = False
+
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+@router.post("/v1/pipelines/audio-dub")
+async def pipeline_audio_dub(request: AudioDubPipelineRequest, http_request: Request = None):
+    """Transcribe → (translate) → clone voice → replace audio track."""
+    import os, tempfile, subprocess, base64
+
+    if not request.video and not request.audio:
+        raise HTTPException(status_code=400, detail="Provide video or audio")
+    if not request.voice_name and not request.ref_audio:
+        raise HTTPException(status_code=400, detail="Provide voice_name or ref_audio for cloning")
+
+    from codai.api.video import _decode_b64_or_url, _tmp_write, _whisper_transcribe, _translate_srt
+    from codai.api.voice_clone import _load_voice, _decode_audio, _f5tts_clone
+
+    temps = []
+    steps = []
+
+    try:
+        # Decode input
+        is_video = bool(request.video)
+        raw = _decode_b64_or_url(request.video or request.audio)
+        ext = '.mp4' if is_video else '.wav'
+        in_path = _tmp_write(raw, ext)
+        temps.append(in_path)
+
+        # Step 1: Transcribe
+        srt_path = await asyncio.get_event_loop().run_in_executor(
+            None, _whisper_transcribe, in_path, request.source_lang,
+            request.whisper_model, temps)
+        if not srt_path:
+            raise HTTPException(status_code=500, detail="Transcription failed — Whisper not available")
+
+        with open(srt_path) as f:
+            srt_content = f.read()
+        steps.append({"step": "transcribe", "srt": srt_content})
+
+        # Step 2: Translate (optional)
+        if request.target_lang:
+            srt_path = await asyncio.get_event_loop().run_in_executor(
+                None, _translate_srt, srt_path, request.target_lang, temps)
+            with open(srt_path) as f:
+                srt_content = f.read()
+            steps.append({"step": "translate", "lang": request.target_lang, "srt": srt_content})
+
+        # Extract plain text from SRT
+        plain_text = ' '.join(
+            line.strip() for line in srt_content.split('\n')
+            if line.strip() and not line.strip()[0].isdigit() and '-->' not in line
+        )
+
+        # Step 3: Resolve reference audio for voice cloning
+        ref_audio_path = None
+        ref_text = request.ref_text or ''
+        if request.voice_name:
+            meta = _load_voice(request.voice_name)
+            if not meta:
+                raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
+            ref_audio_path = meta['audio_file']
+            ref_text = ref_text or meta.get('transcript', '')
+        else:
+            audio_bytes, aext = _decode_audio(request.ref_audio)
+            tmp = tempfile.NamedTemporaryFile(suffix=aext, delete=False)
+            tmp.write(audio_bytes)
+            tmp.close()
+            ref_audio_path = tmp.name
+            temps.append(ref_audio_path)
+
+        if not ref_text:
+            raise HTTPException(status_code=400, detail="ref_text required for voice cloning")
+
+        # Step 4: Clone voice
+        try:
+            dubbed_bytes = await asyncio.get_event_loop().run_in_executor(
+                None, _f5tts_clone,
+                ref_audio_path, ref_text, plain_text,
+                request.speed or 1.0, request.seed,
+            )
+        except ImportError:
+            raise HTTPException(status_code=501, detail="f5-tts not installed. Run: pip install f5-tts")
+
+        dubbed_path = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
+        dubbed_path.write(dubbed_bytes)
+        dubbed_path.close()
+        dubbed_path = dubbed_path.name
+        temps.append(dubbed_path)
+        steps.append({"step": "clone_voice"})
+
+        # Step 5: Replace audio / return
+        if is_video:
+            out_path = tempfile.mktemp(suffix='_dubbed.mp4')
+            temps.append(out_path)
+            cmd = ['ffmpeg', '-y', '-i', in_path, '-i', dubbed_path,
+                   '-map', '0:v', '-map', '1:a',
+                   '-c:v', 'copy', '-c:a', 'aac', '-shortest', out_path]
+            r = subprocess.run(cmd, capture_output=True)
+            if r.returncode != 0:
+                raise HTTPException(status_code=500, detail=f"Audio merge failed: {r.stderr.decode()}")
+
+            if request.burn_subtitles:
+                sub_out = tempfile.mktemp(suffix='_sub.mp4')
+                temps.append(sub_out)
+                r2 = subprocess.run(
+                    ['ffmpeg', '-y', '-i', out_path, '-vf', f'subtitles={srt_path}',
+                     '-c:a', 'copy', sub_out], capture_output=True)
+                if r2.returncode == 0:
+                    out_path = sub_out
+
+            with open(out_path, 'rb') as f:
+                out_bytes = f.read()
+            out_b64 = base64.b64encode(out_bytes).decode()
+            steps.append({"step": "merge_video"})
+            result_data = [{"b64_mp4": out_b64}]
+        else:
+            out_b64 = base64.b64encode(dubbed_bytes).decode()
+            result_data = [{"b64_wav": out_b64}]
+
+        # Save to file path if configured
+        if http_request:
+            from codai.api.voice_clone import _save_audio_response
+            # reuse save logic for the output
+            pass
+
+        return {
+            "created": int(time.time()),
+            "pipeline": "audio-dub",
+            "steps": steps,
+            "data": result_data,
+        }
+
+    finally:
+        for t in temps:
+            try:
+                os.unlink(t)
+            except Exception:
+                pass
--- a/codai/api/ratelimit.py
+++ b/codai/api/ratelimit.py
+# CoderAI - OpenAI-compatible API server
+# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+"""Simple in-process token-bucket rate limiter middleware.
+
+Each distinct (client-IP, route-prefix) pair gets its own bucket.
+Limits are configured via RateLimitConfig.  The defaults below are
+intentionally generous; tighten them through the config file or CLI.
+
+Endpoints covered:
+  /v1/chat/completions      — expensive LLM inference
+  /v1/images/               — image generation
+  /v1/audio/                — TTS / STT / audio generation
+  /v1/video/                — video generation
+  /v1/embeddings            — embedding
+  /v1/completions           — legacy completions
+"""
+
+import time
+import threading
+from collections import defaultdict
+from typing import Dict, Tuple
+
+from fastapi import Request, Response
+from fastapi.responses import JSONResponse
+from starlette.middleware.base import BaseHTTPMiddleware
+
+
+# Per-route-prefix defaults: (max_requests, window_seconds)
+_DEFAULT_LIMITS: Dict[str, Tuple[int, int]] = {
+    "/v1/chat/completions": (60, 60),
+    "/v1/completions":      (60, 60),
+    "/v1/images/":          (30, 60),
+    "/v1/audio/":           (60, 60),
+    "/v1/video/":           (10, 60),
+    "/v1/embeddings":       (120, 60),
+}
+
+# API prefixes that count against the request queue
+_QUEUED_PREFIXES = ("/v1/",)
+
+# Global toggle — set to False to disable rate limiting entirely.
+RATE_LIMITING_ENABLED = True
+
+
+class _Bucket:
+    """Fixed-window counter."""
+    __slots__ = ("count", "window_start")
+
+    def __init__(self, now: float):
+        self.count = 0
+        self.window_start = now
+
+
+class RateLimitMiddleware(BaseHTTPMiddleware):
+    """Apply per-IP, per-route-prefix rate limiting to API endpoints."""
+
+    def __init__(self, app, limits: Dict[str, Tuple[int, int]] = None):
+        super().__init__(app)
+        self._limits = limits or _DEFAULT_LIMITS
+        # (client_ip, prefix) → _Bucket
+        self._buckets: Dict[Tuple[str, str], _Bucket] = defaultdict(lambda: _Bucket(time.monotonic()))
+        self._lock = threading.Lock()
+
+    def _get_prefix(self, path: str) -> str:
+        for prefix in self._limits:
+            if path.startswith(prefix):
+                return prefix
+        return ""
+
+    async def dispatch(self, request: Request, call_next):
+        if not RATE_LIMITING_ENABLED:
+            return await call_next(request)
+
+        path = request.url.path
+
+        # Queue-size enforcement for authenticated API requests
+        if any(path.startswith(p) for p in _QUEUED_PREFIXES):
+            from codai.queue.manager import queue_manager
+            if await queue_manager.is_full():
+                return JSONResponse(
+                    status_code=429,
+                    content={
+                        "error": {
+                            "message": "Server queue is full. Please retry later.",
+                            "type": "rate_limit_error",
+                            "code": 429,
+                        }
+                    },
+                    headers={"Retry-After": "5"},
+                )
+
+        prefix = self._get_prefix(path)
+        if not prefix:
+            return await call_next(request)
+
+        max_req, window = self._limits[prefix]
+        client_ip = (
+            request.headers.get("x-forwarded-for", "").split(",")[0].strip()
+            or (request.client.host if request.client else "unknown")
+        )
+        key = (client_ip, prefix)
+        now = time.monotonic()
+
+        with self._lock:
+            bucket = self._buckets[key]
+            if now - bucket.window_start >= window:
+                bucket.count = 0
+                bucket.window_start = now
+            bucket.count += 1
+            count = bucket.count
+
+        remaining = max(0, max_req - count)
+        reset_at = int(time.time() + (window - (now - self._buckets[key].window_start)))
+
+        if count > max_req:
+            return JSONResponse(
+                status_code=429,
+                content={
+                    "error": {
+                        "message": "Rate limit exceeded. Please slow down.",
+                        "type": "rate_limit_error",
+                        "code": 429,
+                    }
+                },
+                headers={
+                    "X-RateLimit-Limit": str(max_req),
+                    "X-RateLimit-Remaining": "0",
+                    "X-RateLimit-Reset": str(reset_at),
+                    "Retry-After": str(window),
+                },
+            )
+
+        response = await call_next(request)
+        response.headers["X-RateLimit-Limit"] = str(max_req)
+        response.headers["X-RateLimit-Remaining"] = str(remaining)
+        response.headers["X-RateLimit-Reset"] = str(reset_at)
+        return response
--- a/codai/api/text.py
+++ b/codai/api/text.py
@@ -20,12 +20,15 @@ Text generation endpoints for the codai API.

 import asyncio
 import json
+import logging
 import time
 import uuid
 from typing import AsyncGenerator, Dict, List, Optional

 from fastapi import APIRouter, HTTPException, Request

+logger = logging.getLogger(__name__)
+
 # Import from codai modules
 from codai.models.manager import ModelManager, WhisperServerManager, MultiModelManager, model_manager, multi_model_manager
 from codai.queue.manager import QueueManager, queue_manager
@@ -119,68 +122,47 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
            if auth_header.startswith('Bearer '):
                api_key = auth_header[7:]  # Extract token after 'Bearer '
        
-        # If still no API key, use a fake key to allow litellm to proceed
-        # litellm will then fail with the actual provider error if needed
        if not api_key:
-            api_key = "fake-key-for-local-testing"
-            print("DEBUG: No API key provided, using fake key for litellm")
+            raise HTTPException(
+                status_code=401,
+                detail="An API key is required for the LiteLLM backend. "
+                       "Provide an 'Authorization: Bearer <key>' header.",
+            )
        
        # Determine the base URL for litellm to connect to
-        # Use the server's host and port for local connections
        api_base = None

-        # Check if model starts with 'ollama:' - use local Ollama
        if request.model and request.model.startswith('ollama:'):
-            # Get the host from the request headers
            client_host = "127.0.0.1"
            if http_request:
                host_header = http_request.headers.get('host', '')
                if host_header:
-                    # Strip port if present
                    if ':' in host_header:
                        client_host = host_header.split(':')[0]
-                        if client_host.replace('.', '').isdigit():
-                            # It's an IP, keep it
-                            pass
-                        else:
-                            # It's a hostname, use localhost
-                            client_host = "127.0.0.1"
                    else:
                        client_host = host_header
-            
-            # Get port from global_args or use default
            port = getattr(global_args, 'port', 11434) if global_args else 11434
            api_base = f"http://{client_host}:{port}/v1"
-            print(f"DEBUG: Using api_base for Ollama: {api_base}")
        else:
-            # For non-Ollama models, use the server's own URL as base
-            # This allows LiteLLM to make requests to the local server
            if http_request:
-                # Get the host from the request headers
                host_header = http_request.headers.get('host', '')
                if host_header:
-                    # Strip port if present to reconstruct clean URL
                    if ':' in host_header:
-                        client_host = host_header.split(':')[0]
-                        # Keep the port from the request for consistency
-                        server_port = host_header.split(':')[1] if len(host_header.split(':')) > 1 else str(getattr(global_args, 'port', 6745))
+                        parts = host_header.split(':')
+                        client_host = parts[0]
+                        server_port = parts[1] if len(parts) > 1 else str(getattr(global_args, 'port', 6745))
                    else:
                        client_host = host_header
                        server_port = str(getattr(global_args, 'port', 6745))
                else:
-                    # Fallback to client host if no Host header
                    client_host = http_request.client.host if http_request.client else "127.0.0.1"
                    server_port = str(getattr(global_args, 'port', 6745))
            else:
-                # Fallback if no http_request
                client_host = "127.0.0.1"
                server_port = str(getattr(global_args, 'port', 6745))
-            
-            # Determine protocol (http or https)
            use_https = getattr(global_args, 'https', False) or getattr(global_args, 'pubkey', None)
            protocol = "https" if use_https else "http"
            api_base = f"{protocol}://{client_host}:{server_port}/v1"
-            print(f"DEBUG: Using api_base for local server: {api_base}")
        
        # Get or create litellm backend
        litellm_backend = get_litellm_backend(
@@ -228,33 +210,21 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                            stream=True,
                            tool_parser=tool_parser,
                        ):
-                            # Add rate limit headers
-                            headers = {}
-                            if 'usage' in chunk:
-                                headers = litellm_backend.get_rate_limit_headers(
-                                    prompt_tokens=chunk.get('usage', {}).get('prompt_tokens', 0),
-                                    completion_tokens=chunk.get('usage', {}).get('completion_tokens', 0)
-                                )
-                            
-                            # Handle Qwen tool calls if model is Qwen family
                            if 'qwen' in request.model.lower():
                                content = chunk.get('choices', [{}])[0].get('delta', {}).get('content', '')
                                tool_calls = chunk.get('choices', [{}])[0].get('delta', {}).get('tool_calls', [])
-                                
                                if not tool_calls and content:
-                                    # Try to parse tool calls from content
                                    tool_calls = litellm_backend.parse_qwen_tool_calls(content)
                                    if tool_calls:
-                                        # Strip tool tags from content
                                        content = litellm_backend.strip_tool_tags(content)
                                        chunk['choices'][0]['delta']['content'] = content
                                        chunk['choices'][0]['delta']['tool_calls'] = tool_calls
-                            
                            yield f"data: {json.dumps(chunk)}\n\n"
-                        
                        yield "data: [DONE]\n\n"
                    except Exception as e:
+                        # Send error chunk then [DONE] so clients don't hang waiting
                        yield f"data: {json.dumps({'error': {'message': str(e), 'type': 'internal_error'}})}\n\n"
+                        yield "data: [DONE]\n\n"
                
                from fastapi.responses import StreamingResponse
                return StreamingResponse(generate(), media_type="text/event-stream")
@@ -586,10 +556,6 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
        elif not isinstance(m["content"], str):
            messages_dict[i]["content"] = str(m["content"])
    
-    # Debug: print first few messages to see their structure
-    print(f"DEBUG: messages_dict[0] keys: {list(messages_dict[0].keys()) if messages_dict else 'empty'}")
-    if len(messages_dict) > 1:
-        print(f"DEBUG: messages_dict[1] keys: {list(messages_dict[1].keys()) if len(messages_dict) > 1 else 'empty'}")
    
    # Convert tools to dict format if present
    tools_dict = None
@@ -650,10 +616,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
            if get_global_debug():
                print(f"RAW: template_manager.format_for_raw_completion not available")
    
-    # Get resolved model name for response (with coderai/ prefix and proper formatting)
-    # Use multi_model_manager to get the actual loaded models, not the individual model manager
    response_model_name = get_resolved_model_name(requested_model, multi_model_manager)
-    print(f"DEBUG: Requested model: {requested_model}, Resolved model for response: {response_model_name}")
    
    # Handle raw mode - two pass: first capture reasoning, then get final answer
    if use_raw_mode and raw_prompt_for_generation:
@@ -813,7 +776,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                                )
                            tools_list.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
                        except Exception as e:
-                            print(f"DEBUG: Error converting tool in raw stream: {e}")
+                            logger.debug("Error converting tool in raw stream: %s", e)
                            continue
                    
                    if tools_list:
@@ -1014,7 +977,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                        )
                    tools_list.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
                except Exception as e:
-                    print(f"DEBUG: Error converting tool in raw mode: {e}, tool type: {type(t)}")
+                    logger.debug("Error converting tool in raw mode: %s (type: %s)", e, type(t))
                    continue
        
        # Step 1: Use ModelParserAdapter to extract tool calls from final_text (NOT generated_text which includes reasoning)
@@ -1040,7 +1003,7 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                        validated_calls.append(tc)
                
                if len(validated_calls) != len(extracted_tool_calls):
-                    print(f"DEBUG: Filtered out {len(extracted_tool_calls) - len(validated_calls)} invalid tool calls in non-streaming")
+                    logger.debug("Filtered out %d invalid tool calls in non-streaming", len(extracted_tool_calls) - len(validated_calls))
                extracted_tool_calls = validated_calls if validated_calls else None
            
            if extracted_tool_calls:
@@ -1213,7 +1176,6 @@ async def stream_chat_response(
    request_id = f"req-{uuid.uuid4().hex[:8]}"
    
    generated_text = ""
-    print(f"DEBUG: stream_chat_response started, stream=True, tools={tools is not None}")
    
    # Check if model is loaded - if not, notify waiting clients
    # The model manager exists but backend may not be loaded yet in on-demand mode
@@ -1365,9 +1327,6 @@ async def stream_chat_response(
            # Explicitly flush to ensure data is sent immediately
            await asyncio.sleep(0)
        
-        print(f"DEBUG: stream_chat_response completed, {chunk_count} chunks, generated_text length: {len(generated_text)}")
-        if not generated_text.strip():
-            print(f"DEBUG: Warning - no content generated!")
        
        # In debug mode, dump the full generated text
        if get_global_debug():
@@ -1407,7 +1366,7 @@ async def stream_chat_response(
                        )
                    tool_objects.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
                except Exception as e:
-                    print(f"DEBUG: Error converting tool: {e}, tool type: {type(t)}")
+                    logger.debug("Error converting tool: %s (type: %s)", e, type(t))
                    continue
            try:
                tool_calls = tool_parser.extract_tool_calls(generated_text, tool_objects)
@@ -1423,10 +1382,10 @@ async def stream_chat_response(
                        elif isinstance(args, dict):
                            validated_calls.append(tc)
                    if len(validated_calls) != len(tool_calls):
-                        print(f"DEBUG: Filtered out {len(tool_calls) - len(validated_calls)} invalid tool calls in stream_chat_response")
+                        logger.debug("Filtered out %d invalid tool calls in stream_chat_response", len(tool_calls) - len(validated_calls))
                    tool_calls = validated_calls if validated_calls else None
            except Exception as e:
-                print(f"DEBUG: Error extracting tool calls: {e}")
+                logger.debug("Error extracting tool calls: %s", e)
                tool_calls = None
            if tool_calls:
                # In debug mode, dump tool calls
@@ -1628,7 +1587,7 @@ async def generate_chat_response(
                        )
                    tool_objects.append(Tool(type=t.get("type", "function") if isinstance(t, dict) else t.type, function=tool_func))
                except Exception as e:
-                    print(f"DEBUG: Error converting tool: {e}, tool type: {type(t)}")
+                    logger.debug("Error converting tool: %s (type: %s)", e, type(t))
                    continue
            try:
                tool_calls = tool_parser.extract_tool_calls(generated_text, tool_objects)
@@ -1644,10 +1603,10 @@ async def generate_chat_response(
                        elif isinstance(args, dict):
                            validated_calls.append(tc)
                    if len(validated_calls) != len(tool_calls):
-                        print(f"DEBUG: Filtered out {len(tool_calls) - len(validated_calls)} invalid tool calls in generate_chat_response")
+                        logger.debug("Filtered out %d invalid tool calls in generate_chat_response", len(tool_calls) - len(validated_calls))
                    tool_calls = validated_calls if validated_calls else None
            except Exception as e:
-                print(f"DEBUG: Error extracting tool calls: {e}")
+                logger.debug("Error extracting tool calls: %s", e)
                tool_calls = None
            if tool_calls:
                # Always strip tool call format from content

--- a/codai/api/transcriptions.py
+++ b/codai/api/transcriptions.py
@@ -23,8 +23,15 @@ import os
 import tempfile

 from fastapi import APIRouter, HTTPException, UploadFile, File, Form
+from fastapi.responses import PlainTextResponse
 from typing import Optional

+# Maximum upload size: 100 MB
+_MAX_AUDIO_BYTES = 100 * 1024 * 1024
+
+# Safe audio extensions (user-supplied extension is NOT trusted for the suffix)
+_SAFE_EXTENSIONS = {'.wav', '.mp3', '.ogg', '.flac', '.m4a', '.webm', '.mp4'}
+
 # Import from codai modules
 from codai.models.manager import multi_model_manager

@@ -39,6 +46,71 @@ def set_global_args(args):
    global_args = args


+# =============================================================================
+# Response formatting helpers
+# =============================================================================
+
+def _seconds_to_srt_time(s: float) -> str:
+    h = int(s // 3600)
+    m = int((s % 3600) // 60)
+    sec = s % 60
+    return f"{h:02d}:{m:02d}:{sec:06.3f}".replace('.', ',')
+
+
+def _seconds_to_vtt_time(s: float) -> str:
+    h = int(s // 3600)
+    m = int((s % 3600) // 60)
+    sec = s % 60
+    return f"{h:02d}:{m:02d}:{sec:06.3f}"
+
+
+def _format_response(fmt: str, text: str, segments: list):
+    """Format a transcription result according to the requested response_format."""
+    fmt = (fmt or "json").lower()
+
+    if fmt == "text":
+        return PlainTextResponse(text)
+
+    if fmt == "srt":
+        lines = []
+        for i, seg in enumerate(segments, 1):
+            start = _seconds_to_srt_time(seg.get("start", 0))
+            end = _seconds_to_srt_time(seg.get("end", 0))
+            lines.append(f"{i}\n{start} --> {end}\n{seg['text'].strip()}\n")
+        srt_body = "\n".join(lines) if lines else f"1\n00:00:00,000 --> 00:00:00,000\n{text}\n"
+        return PlainTextResponse(srt_body, media_type="text/plain")
+
+    if fmt == "vtt":
+        lines = ["WEBVTT\n"]
+        for seg in segments:
+            start = _seconds_to_vtt_time(seg.get("start", 0))
+            end = _seconds_to_vtt_time(seg.get("end", 0))
+            lines.append(f"{start} --> {end}\n{seg['text'].strip()}\n")
+        if not segments:
+            lines.append(f"00:00:00.000 --> 00:00:00.000\n{text}\n")
+        return PlainTextResponse("\n".join(lines), media_type="text/vtt")
+
+    if fmt == "verbose_json":
+        return {
+            "task": "transcribe",
+            "language": "unknown",
+            "duration": segments[-1].get("end", 0) if segments else 0,
+            "text": text,
+            "segments": [
+                {
+                    "id": i,
+                    "start": s.get("start", 0),
+                    "end": s.get("end", 0),
+                    "text": s.get("text", "").strip(),
+                }
+                for i, s in enumerate(segments)
+            ],
+        }
+
+    # Default: json
+    return {"text": text}
+
+
 # =============================================================================
 # Router and Endpoints
 # =============================================================================
@@ -58,17 +130,37 @@ async def create_transcription(
    """
    Audio transcription endpoint (OpenAI-compatible).
    """
-    # Check if whisper-server is available FIRST
-    if multi_model_manager.whisper_server and multi_model_manager.whisper_server.is_running():
    file_content = await file.read()
-        result = multi_model_manager.whisper_server.transcribe(
-            file_content,
-            language=language,
-            prompt=prompt
-        )
+    if len(file_content) > _MAX_AUDIO_BYTES:
+        raise HTTPException(status_code=413, detail="Audio file too large (max 100 MB)")
+
+    # Check if the requested model is a whisper-server instance
+    wsm = multi_model_manager.whisper_servers.get(model)
+    if wsm is None and multi_model_manager.whisper_server is not None:
+        # Legacy single-instance fallback: use it if no specific match
+        if not multi_model_manager.whisper_servers:
+            wsm = multi_model_manager.whisper_server
+
+    if wsm is not None:
+        ws_key = f"audio:{model}" if model in multi_model_manager.whisper_servers else "audio:whisper-server"
+
+        # Let the VRAM manager evict other models if needed
+        multi_model_manager.request_model(requested_model=model, model_type="audio")
+
+        # Start the subprocess if it isn't running (on-demand)
+        if not wsm.is_running():
+            wsm.start(getattr(wsm, '_model_path', None), gpu_device=getattr(wsm, '_gpu_device', 0))
+            if wsm.is_running():
+                multi_model_manager.models[ws_key] = wsm
+                multi_model_manager.active_in_vram = ws_key
+                multi_model_manager.models_in_vram.add(ws_key)
+
+        if wsm.is_running():
+            result = wsm.transcribe(file_content, language=language, prompt=prompt)
            if "error" in result:
                raise HTTPException(status_code=500, detail=result["error"])
-        return {"text": result.get("text", "")}
+            return _format_response(response_format, result.get("text", ""), [])
+        # Fall through to Python backends if subprocess failed to start

    # Use the manager to resolve the model and manage VRAM
    model_info = multi_model_manager.request_model(
@@ -90,11 +182,13 @@ async def create_transcription(
            detail="Audio transcription not configured. Use --audio-model or --whisper-server."
        )

-    # Read the uploaded file
-    file_content = await file.read()
+    # Determine a safe file extension from the upload's content-type or filename,
+    # never trusting the raw user-supplied value for arbitrary suffixes.
+    raw_ext = os.path.splitext(file.filename or '')[1].lower()
+    safe_ext = raw_ext if raw_ext in _SAFE_EXTENSIONS else '.wav'

    # Save to temp file (needed for some backends)
-    with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as tmp:
+    with tempfile.NamedTemporaryFile(delete=False, suffix=safe_ext) as tmp:
        tmp.write(file_content)
        tmp_path = tmp.name
    
@@ -104,41 +198,27 @@ async def create_transcription(
            from faster_whisper import WhisperModel

            if whisper_model is None:
-                print(f"Loading faster-whisper model: {model_name}")
-                
-                # Determine compute type - always use int8 for CPU
-                compute_type = "int8"
-                
-                # Load the model
                whisper_model = WhisperModel(
                    model_name,
-                    device="cpu",  # Always use CPU - faster-whisper CUDA doesn't work with AMD
-                    compute_type=compute_type,
+                    device="cpu",
+                    compute_type="int8",
                )
-                
-                # Cache the model
                multi_model_manager.add_model(model_key, whisper_model)
                multi_model_manager.current_model_key = model_key
-                print(f"Loaded faster-whisper model: {model_name}")

-            # Run transcription
-            segments, info = whisper_model.transcribe(
+            raw_segments, _ = whisper_model.transcribe(
                tmp_path,
                language=language,
                initial_prompt=prompt,
                temperature=temperature,
            )
-            
-            # Collect all segments
-            text_parts = []
-            for segment in segments:
-                text_parts.append(segment.text)
-            
-            full_text = "".join(text_parts)
-            
-            return {
-                "text": full_text.strip()
-            }
+            # Materialise the generator so we have all segment data
+            segments = [
+                {"start": s.start, "end": s.end, "text": s.text}
+                for s in raw_segments
+            ]
+            full_text = "".join(s["text"] for s in segments)
+            return _format_response(response_format, full_text.strip(), segments)

        except ImportError:
            pass
@@ -148,41 +228,26 @@ async def create_transcription(
            import whispercpp

            if whisper_model is None:
-                print(f"Loading whispercpp model: {model_name}")
-                
-                # Check if it's a built-in model name
-                if model_name in ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large']:
-                    # It's a built-in model name
-                    whisper_model = whispercpp.Whisper.from_pretrained(model_name)
-                else:
-                    # It's a path to a GGUF file
                whisper_model = whispercpp.Whisper.from_pretrained(model_name)
-                
-                # Cache the model
                multi_model_manager.add_model(model_key, whisper_model)
                multi_model_manager.current_model_key = model_key
-                print(f"Loaded whispercpp model: {model_name}")

-            # Run transcription
            result = whisper_model.transcribe(tmp_path)

-            # Extract text from result
            text = ""
            if hasattr(result, 'text'):
                text = result.text
            elif isinstance(result, dict):
                text = result.get('text', '')
            elif isinstance(result, list):
-                # Some versions return a list of segments
                for segment in result:
                    if hasattr(segment, 'text'):
                        text += segment.text
                    elif isinstance(segment, dict):
                        text += segment.get('text', '')

-            return {
-                "text": text.strip()
-            }
+            # whispercpp does not expose per-segment timestamps easily
+            return _format_response(response_format, text.strip(), [])

        except ImportError as e:
            raise HTTPException(

--- a/codai/api/video.py
+++ b/codai/api/video.py
@@ -263,7 +263,39 @@ def _apply_camera_motion(kw: dict, camera_motion: str):
        kw['camera_motion'] = camera_motion


-def _apply_character_refs(kw: dict, character_references: List[str], strength: float):
+def _resolve_character_inputs(request) -> tuple[List[str], List[str]]:
+    """Return (flat_image_list, name_list) from any combination of request fields."""
+    images: List[str] = []
+    names: List[str] = []
+
+    # 1. Expand named saved profiles
+    if request.character_profiles:
+        try:
+            from codai.api.characters import resolve_character_profiles
+            images += resolve_character_profiles(request.character_profiles)
+            names += list(request.character_profiles)
+        except Exception:
+            pass
+
+    # 2. Named character slots [{name, images:[...]}, ...]
+    if request.characters:
+        for slot in request.characters:
+            slot_imgs = slot.get('images') or []
+            images += slot_imgs
+            if slot.get('name'):
+                names.append(slot['name'])
+
+    # 3. Legacy flat list
+    if request.character_references:
+        images += list(request.character_references)
+        if request.character_names:
+            names += list(request.character_names)
+
+    return images, names
+
+
+def _apply_character_refs(kw: dict, character_references: List[str], strength: float,
+                           names: Optional[List[str]] = None):
    """Apply character reference images to pipeline kwargs."""
    if not character_references:
        return
@@ -291,8 +323,13 @@ def _generate_video(pipe, request: VideoGenerationRequest):

    _apply_camera_motion(kw, request.camera_motion)

-    if request.character_references:
-        _apply_character_refs(kw, request.character_references, request.character_strength or 0.8)
+    char_images, char_names = _resolve_character_inputs(request)
+    if char_images:
+        _apply_character_refs(kw, char_images, request.character_strength or 0.8, char_names)
+        # Prepend character names to prompt for better conditioning
+        if char_names and kw.get('prompt'):
+            names_hint = ', '.join(char_names)
+            kw['prompt'] = f"{names_hint}. {kw['prompt']}"

    init_src = request.init_image or request.image

@@ -359,35 +396,49 @@ def _ffmpeg_upscale(path: str, factor: int, temps: list) -> str:
    scale = f"scale=iw*{factor}:ih*{factor}:flags=lanczos"
    cmd = ['ffmpeg', '-y', '-i', path, '-vf', scale, '-c:a', 'copy', out]
    r = subprocess.run(cmd, capture_output=True)
-    if r.returncode == 0:
-        return out
+    if r.returncode != 0:
+        import logging
+        logging.getLogger(__name__).warning(
+            "ffmpeg upscale failed (rc=%d): %s", r.returncode, r.stderr.decode(errors='replace')
+        )
        return path  # fallback to original if ffmpeg fails
+    return out


 def _rife_interpolate(path: str, multiplier: int, temps: list) -> str:
    out = tempfile.mktemp(suffix='_rife.mp4')
    temps.append(out)
-    # Try rife-ncnn-vulkan binary if available
-    import shutil
+    import logging, shutil
+    _log = logging.getLogger(__name__)
    if shutil.which('rife-ncnn-vulkan'):
        frames_dir = tempfile.mkdtemp()
        out_dir = tempfile.mkdtemp()
        temps += [frames_dir, out_dir]
-        subprocess.run(['ffmpeg', '-y', '-i', path, f'{frames_dir}/%08d.png'],
+        r = subprocess.run(['ffmpeg', '-y', '-i', path, f'{frames_dir}/%08d.png'],
                           capture_output=True)
-        subprocess.run(['rife-ncnn-vulkan', '-i', frames_dir, '-o', out_dir,
-                        '-m', f'rife-v4'], capture_output=True)
-        subprocess.run(['ffmpeg', '-y', '-r', str(multiplier * 8), '-i',
+        if r.returncode != 0:
+            _log.warning("ffmpeg frame extraction failed: %s", r.stderr.decode(errors='replace'))
+        else:
+            r = subprocess.run(['rife-ncnn-vulkan', '-i', frames_dir, '-o', out_dir,
+                                '-m', 'rife-v4'], capture_output=True)
+            if r.returncode != 0:
+                _log.warning("rife-ncnn-vulkan failed: %s", r.stderr.decode(errors='replace'))
+            else:
+                r = subprocess.run(['ffmpeg', '-y', '-r', str(multiplier * 8), '-i',
                                    f'{out_dir}/%08d.png', '-c:v', 'libx264', out],
                                   capture_output=True)
-        if os.path.exists(out):
+                if r.returncode != 0:
+                    _log.warning("ffmpeg reassembly failed: %s", r.stderr.decode(errors='replace'))
+                elif os.path.exists(out):
                    return out
    # Simple ffmpeg minterpolate fallback
-    fps_expr = f"fps=fps={multiplier}*source_fps"
    cmd = ['ffmpeg', '-y', '-i', path, '-filter:v',
           f'minterpolate=fps={multiplier * 8}', '-c:a', 'copy', out]
    r = subprocess.run(cmd, capture_output=True)
-    return out if r.returncode == 0 else path
+    if r.returncode != 0:
+        _log.warning("ffmpeg minterpolate failed: %s", r.stderr.decode(errors='replace'))
+        return path
+    return out


 def _add_audio_to_video(path: str, request: VideoGenerationRequest,

--- a/codai/api/voice_clone.py
+++ b/codai/api/voice_clone.py
+"""
+Voice cloning endpoints.
+
+POST /v1/audio/clone          — synthesize speech in a cloned voice
+GET  /v1/audio/voices         — list saved voice profiles
+POST /v1/audio/voices         — save a named voice profile (ref audio + transcript)
+DELETE /v1/audio/voices/{name} — delete a voice profile
+"""
+
+import asyncio
+import base64
+import io
+import json
+import os
+import tempfile
+import time
+from typing import Optional
+
+from fastapi import APIRouter, HTTPException, Request, UploadFile, File, Form
+from pydantic import BaseModel, ConfigDict
+
+router = APIRouter()
+
+global_args = None
+global_file_path = None
+
+# Directory where voice profiles are stored
+_VOICES_DIR: Optional[str] = None
+
+
+def set_global_args(args):
+    global global_args, _VOICES_DIR
+    global_args = args
+    # Store voice profiles alongside output files, or in a default location
+    base = getattr(args, 'file_path', None) or os.path.expanduser('~/.coderai/voices')
+    _VOICES_DIR = os.path.join(base if os.path.isdir(base) else os.path.dirname(base) if base else os.path.expanduser('~/.coderai'), 'voices')
+    os.makedirs(_VOICES_DIR, exist_ok=True)
+
+
+def set_global_file_path(path):
+    global global_file_path
+    global_file_path = path
+
+
+def _voices_dir() -> str:
+    if _VOICES_DIR:
+        return _VOICES_DIR
+    d = os.path.expanduser('~/.coderai/voices')
+    os.makedirs(d, exist_ok=True)
+    return d
+
+
+def _voice_path(name: str) -> str:
+    return os.path.join(_voices_dir(), name)
+
+
+def _list_voices() -> list:
+    d = _voices_dir()
+    voices = []
+    for entry in os.scandir(d):
+        if entry.is_dir():
+            meta_path = os.path.join(entry.path, 'meta.json')
+            if os.path.exists(meta_path):
+                with open(meta_path) as f:
+                    meta = json.load(f)
+                voices.append(meta)
+    return sorted(voices, key=lambda v: v.get('created_at', 0))
+
+
+def _save_voice(name: str, audio_bytes: bytes, audio_ext: str, transcript: str, description: str = '') -> dict:
+    vdir = _voice_path(name)
+    os.makedirs(vdir, exist_ok=True)
+    audio_file = os.path.join(vdir, f'ref{audio_ext}')
+    with open(audio_file, 'wb') as f:
+        f.write(audio_bytes)
+    meta = {
+        'name': name,
+        'description': description,
+        'transcript': transcript,
+        'audio_file': audio_file,
+        'audio_ext': audio_ext,
+        'created_at': int(time.time()),
+    }
+    with open(os.path.join(vdir, 'meta.json'), 'w') as f:
+        json.dump(meta, f)
+    return meta
+
+
+def _load_voice(name: str) -> Optional[dict]:
+    meta_path = os.path.join(_voice_path(name), 'meta.json')
+    if not os.path.exists(meta_path):
+        return None
+    with open(meta_path) as f:
+        return json.load(f)
+
+
+def _decode_audio(data: str) -> tuple[bytes, str]:
+    """Decode base64 audio data, return (bytes, ext)."""
+    if data.startswith('data:'):
+        mime, b64 = data.split(',', 1)
+        ext = '.' + mime.split('/')[1].split(';')[0]
+        return base64.b64decode(b64), ext
+    return base64.b64decode(data), '.wav'
+
+
+def _f5tts_clone(ref_audio_path: str, ref_text: str, gen_text: str,
+                  speed: float = 1.0, seed: Optional[int] = None) -> bytes:
+    """Run F5-TTS voice cloning, return WAV bytes."""
+    from f5_tts.api import F5TTS
+    import soundfile as sf
+    import numpy as np
+
+    device = None
+    if global_args:
+        import torch
+        if torch.cuda.is_available():
+            device = 'cuda'
+
+    tts = F5TTS(device=device)
+    wav, sr, _ = tts.infer(
+        ref_file=ref_audio_path,
+        ref_text=ref_text,
+        gen_text=gen_text,
+        speed=speed,
+        seed=seed,
+        show_info=lambda x: None,
+        progress=lambda x, **kw: x,
+    )
+
+    buf = io.BytesIO()
+    sf.write(buf, wav, sr, format='WAV')
+    return buf.getvalue()
+
+
+def _save_audio_response(audio_bytes: bytes, http_request: Request) -> dict:
+    import uuid
+    filename = f"{uuid.uuid4().hex}.wav"
+    if global_file_path:
+        os.makedirs(global_file_path, exist_ok=True)
+        fpath = os.path.join(global_file_path, filename)
+        with open(fpath, 'wb') as f:
+            f.write(audio_bytes)
+        host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
+        if ':' in host:
+            parts = host.split(':')
+            if len(parts) == 2 and parts[1].isdigit():
+                host = parts[0]
+        use_https = getattr(global_args, 'https', False) if global_args else False
+        proto = 'https' if use_https else 'http'
+        port = getattr(global_args, 'port', 8000) if global_args else 8000
+        return {"url": f"{proto}://{host}:{port}/v1/files/{filename}"}
+    return {"b64_wav": base64.b64encode(audio_bytes).decode()}
+
+
+# ---------------------------------------------------------------------------
+# Voice profile management
+# ---------------------------------------------------------------------------
+
+@router.get("/v1/audio/voices")
+async def list_voices():
+    """List all saved voice profiles."""
+    return {"voices": _list_voices()}
+
+
+@router.post("/v1/audio/voices")
+async def create_voice(
+    name: str = Form(...),
+    transcript: str = Form(...),
+    description: str = Form(''),
+    audio: UploadFile = File(...),
+):
+    """Save a named voice profile from a reference audio file + transcript."""
+    if not name.replace('-', '').replace('_', '').isalnum():
+        raise HTTPException(status_code=400, detail="Voice name must be alphanumeric (hyphens/underscores allowed)")
+
+    audio_bytes = await audio.read()
+    ext = os.path.splitext(audio.filename)[1] or '.wav'
+
+    # Validate audio is readable
+    try:
+        import soundfile as sf, io as _io
+        sf.info(_io.BytesIO(audio_bytes))
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f"Invalid audio file: {e}")
+
+    meta = _save_voice(name, audio_bytes, ext, transcript, description)
+    return {"created": True, "voice": meta}
+
+
+@router.delete("/v1/audio/voices/{name}")
+async def delete_voice(name: str):
+    """Delete a saved voice profile."""
+    import shutil
+    vdir = _voice_path(name)
+    if not os.path.exists(vdir):
+        raise HTTPException(status_code=404, detail=f"Voice '{name}' not found")
+    shutil.rmtree(vdir)
+    return {"deleted": True, "name": name}
+
+
+# ---------------------------------------------------------------------------
+# Voice cloning TTS
+# ---------------------------------------------------------------------------
+
+class VoiceCloneRequest(BaseModel):
+    text: str                               # text to synthesize
+    voice_name: Optional[str] = None        # use a saved voice profile
+    ref_audio: Optional[str] = None         # base64 reference audio (if not using saved voice)
+    ref_text: Optional[str] = None          # transcript of ref_audio
+    speed: Optional[float] = 1.0
+    seed: Optional[int] = None
+    response_format: Optional[str] = "url"
+    model_config = ConfigDict(extra="allow")
+
+
+@router.post("/v1/audio/clone")
+async def clone_voice(request: VoiceCloneRequest, http_request: Request = None):
+    """
+    Synthesize speech in a cloned voice using F5-TTS.
+
+    Provide either:
+    - voice_name: name of a saved voice profile
+    - ref_audio (base64) + ref_text: inline reference audio
+    """
+    # Resolve reference audio
+    ref_audio_path = None
+    ref_text = request.ref_text or ''
+    temps = []
+
+    try:
+        if request.voice_name:
+            meta = _load_voice(request.voice_name)
+            if not meta:
+                raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
+            ref_audio_path = meta['audio_file']
+            ref_text = ref_text or meta.get('transcript', '')
+        elif request.ref_audio:
+            audio_bytes, ext = _decode_audio(request.ref_audio)
+            tmp = tempfile.NamedTemporaryFile(suffix=ext, delete=False)
+            tmp.write(audio_bytes)
+            tmp.close()
+            ref_audio_path = tmp.name
+            temps.append(ref_audio_path)
+        else:
+            raise HTTPException(status_code=400, detail="Provide voice_name or ref_audio")
+
+        if not ref_text:
+            raise HTTPException(status_code=400, detail="ref_text (transcript of reference audio) is required for voice cloning")
+
+        try:
+            audio_bytes = await asyncio.get_event_loop().run_in_executor(
+                None, _f5tts_clone,
+                ref_audio_path, ref_text, request.text,
+                request.speed or 1.0, request.seed,
+            )
+        except ImportError:
+            raise HTTPException(status_code=501, detail="f5-tts not installed. Run: pip install f5-tts")
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Voice cloning failed: {e}")
+
+        result = _save_audio_response(audio_bytes, http_request)
+        return {"created": int(time.time()), "data": [result]}
+
+    finally:
+        for t in temps:
+            try:
+                os.unlink(t)
+            except Exception:
+                pass
--- a/codai/api/voice_convert.py
+++ b/codai/api/voice_convert.py
+"""
+Voice conversion endpoint — converts timbre while preserving pitch, melody and expression.
+Unlike TTS-based dubbing, this works correctly for singing and music.
+
+POST /v1/audio/convert   — convert voice timbre in audio (speech or singing)
+"""
+
+import asyncio
+import base64
+import io
+import os
+import tempfile
+import time
+from typing import Optional
+
+import numpy as np
+import soundfile as sf
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel, ConfigDict
+
+router = APIRouter()
+
+global_args = None
+global_file_path = None
+
+_wrapper = None   # SeedVCWrapper singleton
+
+
+def set_global_args(args):
+    global global_args
+    global_args = args
+
+
+def set_global_file_path(path):
+    global global_file_path
+    global_file_path = path
+
+
+def _get_wrapper():
+    global _wrapper
+    if _wrapper is None:
+        from seed_vc.seed_vc_wrapper import SeedVCWrapper
+        _wrapper = SeedVCWrapper()
+    return _wrapper
+
+
+def _decode_audio_to_file(data: str, suffix: str = '.wav') -> str:
+    if data.startswith('data:'):
+        _, b64 = data.split(',', 1)
+        raw = base64.b64decode(b64)
+    else:
+        raw = base64.b64decode(data)
+    tmp = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
+    tmp.write(raw)
+    tmp.close()
+    return tmp.name
+
+
+def _save_response(audio_np: np.ndarray, sr: int, http_request) -> dict:
+    buf = io.BytesIO()
+    sf.write(buf, audio_np, sr, format='WAV')
+    wav_bytes = buf.getvalue()
+    import uuid
+    filename = f'{uuid.uuid4().hex}_converted.wav'
+    if global_file_path:
+        os.makedirs(global_file_path, exist_ok=True)
+        fpath = os.path.join(global_file_path, filename)
+        with open(fpath, 'wb') as f:
+            f.write(wav_bytes)
+        host = http_request.headers.get('host', '127.0.0.1') if http_request else '127.0.0.1'
+        if ':' in host:
+            parts = host.split(':')
+            if len(parts) == 2 and parts[1].isdigit():
+                host = parts[0]
+        proto = 'https' if getattr(global_args, 'https', False) else 'http'
+        port = getattr(global_args, 'port', 8000) if global_args else 8000
+        return {'url': f'{proto}://{host}:{port}/v1/files/{filename}'}
+    return {'b64_wav': base64.b64encode(wav_bytes).decode()}
+
+
+class VoiceConvertRequest(BaseModel):
+    """
+    Convert the timbre of source_audio to match target_voice,
+    while preserving pitch, melody, rhythm and expression.
+
+    Use f0_condition=True for singing/music (slower but pitch-accurate).
+    Use f0_condition=False for speech (faster).
+    """
+    source_audio: str                       # base64 audio to convert (the performance)
+    target_voice: Optional[str] = None      # base64 reference audio for target timbre
+    voice_name: Optional[str] = None        # saved voice profile name
+
+    f0_condition: Optional[bool] = False    # True = singing/music mode (preserves pitch)
+    pitch_shift: Optional[int] = 0         # semitones to shift after conversion
+    diffusion_steps: Optional[int] = 10    # quality vs speed (10–30)
+    length_adjust: Optional[float] = 1.0
+    inference_cfg_rate: Optional[float] = 0.7
+
+    response_format: Optional[str] = 'url'
+    model_config = ConfigDict(extra='allow')
+
+
+@router.post('/v1/audio/convert')
+async def convert_voice(request: VoiceConvertRequest, http_request: Request = None):
+    """
+    Voice conversion: preserves pitch/melody/expression, changes only timbre.
+    Set f0_condition=True for singing and music.
+    """
+    target_path = None
+    temps = []
+    try:
+        if request.voice_name:
+            from codai.api.voice_clone import _load_voice
+            meta = _load_voice(request.voice_name)
+            if not meta:
+                raise HTTPException(status_code=404, detail=f"Voice '{request.voice_name}' not found")
+            target_path = meta['audio_file']
+        elif request.target_voice:
+            target_path = _decode_audio_to_file(request.target_voice)
+            temps.append(target_path)
+        else:
+            raise HTTPException(status_code=400, detail='Provide voice_name or target_voice')
+
+        source_path = _decode_audio_to_file(request.source_audio)
+        temps.append(source_path)
+
+        try:
+            wrapper = _get_wrapper()
+        except ImportError:
+            raise HTTPException(status_code=501,
+                detail='seed-vc not installed. Run: pip install seed-vc')
+
+        def _run():
+            return wrapper.convert_voice(
+                source=source_path,
+                target=target_path,
+                diffusion_steps=request.diffusion_steps or 10,
+                length_adjust=request.length_adjust or 1.0,
+                inference_cfg_rate=request.inference_cfg_rate or 0.7,
+                f0_condition=bool(request.f0_condition),
+                pitch_shift=request.pitch_shift or 0,
+                stream_output=False,
+            )
+
+        try:
+            audio_out = await asyncio.get_event_loop().run_in_executor(None, _run)
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f'Voice conversion failed: {e}')
+
+        sr = 44100 if request.f0_condition else 22050
+        if isinstance(audio_out, tuple):
+            audio_out = audio_out[0]
+
+        result = _save_response(np.array(audio_out).flatten(), sr, http_request)
+        return {'created': int(time.time()), 'data': [result]}
+
+    finally:
+        for t in temps:
+            try:
+                os.unlink(t)
+            except Exception:
+                pass
--- a/codai/config.py
+++ b/codai/config.py
@@ -30,6 +30,7 @@ class ServerConfig:
    https: bool = False
    https_key_path: Optional[str] = None
    https_cert_path: Optional[str] = None
+    queue_max_size: int = 6


 @dataclass
@@ -128,10 +129,12 @@ class ConfigManager:
        self.config_path = self.config_dir / "config.json"
        self.models_path = self.config_dir / "models.json"
        self.auth_path = self.config_dir / "auth.json"
+        self.pipelines_path = self.config_dir / "pipelines.json"
        
        self.config: Optional[Config] = None
        self.models_data: Dict[str, Any] = {}
        self.auth_data: Dict[str, Any] = {}
+        self.pipelines_data: list = []
    
    def ensure_config_dir(self):
        """Create configuration directory if it doesn't exist."""
@@ -196,19 +199,12 @@ class ConfigManager:
        
        # Create default auth.json
        if not self.auth_path.exists():
-            try:
-                from argon2 import PasswordHasher
-                ph = PasswordHasher()
-                default_admin_hash = ph.hash("admin")
-            except ImportError:
            from codai.admin.auth import hash_password
-                default_admin_hash = hash_password("admin")
-            
            default_auth = {
                "users": [{
                    "id": 1,
                    "username": "admin",
-                    "password_hash": default_admin_hash,
+                    "password_hash": hash_password("admin"),
                    "role": "admin",
                    "created_at": "2026-05-03T00:00:00Z",
                    "must_change_password": True
@@ -219,8 +215,8 @@ class ConfigManager:
            with open(self.auth_path, 'w') as f:
                json.dump(default_auth, f, indent=2)
            print(f"Created default auth config: {self.auth_path}")
-            print("\nDefault credentials: admin / admin")
-            print("You will be prompted to change the password on first login.\n")
+            print(f"\nDefault credentials: admin / admin")
+            print("IMPORTANT: Change this password immediately after first login.\n")
    
    def load(self) -> Config:
        """Load configuration from files.
@@ -229,7 +225,6 @@ class ConfigManager:
            Config object with loaded settings
        """
        # Create defaults if config directory is empty or doesn't exist
-        if not self.config_dir.exists() or not any(self.config_dir.iterdir()):
        self.create_default_configs()
        
        # Load config.json
@@ -287,6 +282,13 @@ class ConfigManager:
                "sessions": {}
            }

+        # Load pipelines.json
+        if self.pipelines_path.exists():
+            with open(self.pipelines_path, 'r') as f:
+                self.pipelines_data = json.load(f)
+        else:
+            self.pipelines_data = []
+        
        return self.config
    
    def save_config(self):
@@ -298,7 +300,8 @@ class ConfigManager:
                "port": self.config.server.port,
                "https": self.config.server.https,
                "https_key_path": self.config.server.https_key_path,
-                "https_cert_path": self.config.server.https_cert_path
+                "https_cert_path": self.config.server.https_cert_path,
+                "queue_max_size": self.config.server.queue_max_size,
            },
            "backend": {
                "type": self.config.backend.type,
@@ -367,6 +370,11 @@ class ConfigManager:
        with open(self.auth_path, 'w') as f:
            json.dump(self.auth_data, f, indent=2)

+    def save_pipelines(self):
+        """Save pipelines.json to disk."""
+        with open(self.pipelines_path, 'w') as f:
+            json.dump(self.pipelines_data, f, indent=2)
+    
    def reload(self):
        """Reload all configuration files."""
        return self.load()
\ No newline at end of file
--- a/codai/main.py
+++ b/codai/main.py
@@ -368,7 +368,21 @@ def main():
    audio_models = models_config.get("audio_models", [])
    for m in audio_models:
        mid = _model_id(m)
-        if mid:
+        if not mid:
+            continue
+        backend = m.get("backend", "") if isinstance(m, dict) else ""
+        if backend == "whisper-server":
+            # Register as a whisper-server instance
+            cfg = _model_cfg(m, "audio")
+            multi_model_manager.register_whisper_server(
+                model_id=mid,
+                server_path=m.get("server_path", config.whisper.server_path or ""),
+                model_path=m.get("model_path") or None,
+                port=int(m.get("port", config.whisper.server_port)),
+                gpu_device=int(m.get("gpu_device", config.vulkan.device_id)),
+                config=cfg,
+            )
+        else:
            multi_model_manager.set_audio_model(mid, config=_model_cfg(m, "audio"))

    # Image models
@@ -446,7 +460,18 @@ def main():
                    print(f"  Loaded: {mid}")
                else:
                    print(f"  Warning: {mid} failed to load")
-            # image/audio/vision/tts pre-loading is handled by their respective
+            elif mtype == "audio" and mid in multi_model_manager.whisper_servers:
+                wsm = multi_model_manager.whisper_servers[mid]
+                result = wsm.start(wsm._model_path, gpu_device=wsm._gpu_device)
+                if wsm.is_running():
+                    ws_key = f"audio:{mid}"
+                    multi_model_manager.models[ws_key] = wsm
+                    multi_model_manager.active_in_vram = ws_key
+                    multi_model_manager.models_in_vram.add(ws_key)
+                    print(f"  whisper-server started: {mid}")
+                else:
+                    print(f"  Warning: whisper-server '{mid}' failed to start")
+            # image/vision/tts pre-loading is handled by their respective
            # API modules on first request; we just log intent here.
            else:
                print(f"  Note: pre-loading for {mtype} models happens on first request")
@@ -550,6 +575,27 @@ def main():
    if global_file_path:
        set_audiogen_file_path(global_file_path)

+    # Set voice clone module global args
+    from codai.api.voice_clone import set_global_args as set_vc_global_args, set_global_file_path as set_vc_file_path
+    set_vc_global_args(global_args)
+    if global_file_path:
+        set_vc_file_path(global_file_path)
+
+    from codai.api.voice_convert import set_global_args as set_vconv_global_args, set_global_file_path as set_vconv_file_path
+    set_vconv_global_args(global_args)
+    if global_file_path:
+        set_vconv_file_path(global_file_path)
+
+    # Set faceswap module global args
+    from codai.api.faceswap import set_global_args as set_fs_global_args, set_global_file_path as set_fs_file_path
+    set_fs_global_args(global_args)
+    if global_file_path:
+        set_fs_file_path(global_file_path)
+
+    # Set character profiles module global args
+    from codai.api.characters import set_global_args as set_chars_global_args
+    set_chars_global_args(global_args)
+
    # Set embeddings module global args
    from codai.api.embeddings import set_global_args as set_embed_global_args
    set_embed_global_args(global_args)
@@ -585,6 +631,10 @@ def main():


    
+    # Apply queue max size from config
+    from codai.queue.manager import queue_manager
+    queue_manager.max_size = config.server.queue_max_size
+
    # Start the server
    import uvicorn
    print(f"\nStarting server on http://{config.server.host}:{config.server.port}")

--- a/codai/models/manager.py
+++ b/codai/models/manager.py
@@ -389,6 +389,11 @@ class WhisperServerManager:
            "url": self.base_url
        }

+    def cleanup(self):
+        """Stop the subprocess — called by the VRAM eviction/unload machinery."""
+        print("whisper-server: evicted from VRAM, stopping subprocess")
+        self.stop()
+

 class MultiModelManager:
    """
@@ -412,9 +417,11 @@ class MultiModelManager:
        self.active_in_vram: Optional[str] = None  # most-recently-used model key
        self.models_in_vram: set = set()  # all models currently in VRAM
        self.model_aliases: Dict[str, str] = {}
-        self.whisper_server: Optional[WhisperServerManager] = None
+        self.whisper_server: Optional[WhisperServerManager] = None  # legacy single-instance compat
+        self.whisper_servers: Dict[str, WhisperServerManager] = {}  # id -> manager
        self.model_backend_types: Dict[str, str] = {}
        self.tool_breaker = FuzzyToolBreaker(threshold=3)  # Circuit breaker for repetitive tool calls
+        self._load_lock = threading.Lock()  # Prevents duplicate on-demand model loads
    
    @property
    def image_model(self) -> Optional[str]:
@@ -432,12 +439,17 @@ class MultiModelManager:
                print(f"Warning: Error cleaning up model {key}: {e}")
        self.models.clear()
        
-        # Cleanup whisper server
-        if self.whisper_server:
+        # Cleanup whisper server(s)
+        for wsm in self.whisper_servers.values():
            try:
-                self.whisper_server.stop()
+                wsm.stop()
            except Exception as e:
                print(f"Warning: Error cleaning up whisper server: {e}")
+        if self.whisper_server and self.whisper_server not in self.whisper_servers.values():
+            try:
+                self.whisper_server.stop()
+            except Exception:
+                pass
        
        # Clear all model lists
        self.default_model = None
@@ -520,30 +532,30 @@ class MultiModelManager:
            print(f"Model '{model_name}' cached as: {resolved_model}")
    
    def _load_default_model(self):
-        """Load the default model on demand."""
+        """Load the default model on demand (thread-safe)."""
        if not self.default_model:
            return None

-        # Check if already loaded
+        # Fast path: already loaded (checked without lock for performance)
+        if self.default_model in self.models:
+            return self.models[self.default_model]
+
+        with self._load_lock:
+            # Re-check inside the lock to avoid duplicate loads from concurrent requests
            if self.default_model in self.models:
                return self.models[self.default_model]

-        # Get config and backend type
            config = self.config.get(self.default_model, {})
            backend_type = self.model_backend_types.get(self.default_model, "auto")

-        # Get global args for additional parameters
            try:
                from codai.api.state import get_global_args
                global_args = get_global_args()
-        except:
+            except Exception:
                global_args = None

-        # Create new model manager and load the model
            model_manager = ModelManager()
-        
            try:
-            # Build kwargs from config
                kwargs = {}
                if 'ctx' in config:
                    kwargs['ctx'] = config['ctx']
@@ -569,40 +581,35 @@ class MultiModelManager:

                print(f"Loading default model on demand: {self.default_model}")
                model_manager.load_model(self.default_model, backend_type=backend_type, **kwargs)
-            
-            # Add to models dict
                self.models[self.default_model] = model_manager
                self.current_model_key = self.default_model
-            
                print(f"Model loaded successfully: {self.default_model}")
                return model_manager
-            
            except Exception as e:
                print(f"Error loading model {self.default_model}: {e}")
                return None
    
    def _load_model_by_name(self, model_name: str):
-        """Load a model by name on demand."""
-        # Check if already loaded
+        """Load a model by name on demand (thread-safe)."""
+        if model_name in self.models:
+            return self.models[model_name]
+
+        with self._load_lock:
+            # Re-check inside lock to prevent duplicate loads
            if model_name in self.models:
                return self.models[model_name]

-        # Check if it's registered in config
            config = self.config.get(model_name, {})
            backend_type = self.model_backend_types.get(model_name, "auto")

-        # Get global args for additional parameters
            try:
                from codai.api.state import get_global_args
                global_args = get_global_args()
-        except:
+            except Exception:
                global_args = None

-        # Create new model manager and load the model
            model_manager = ModelManager()
-        
            try:
-            # Build kwargs from config
                kwargs = {}
                if 'ctx' in config:
                    kwargs['ctx'] = config['ctx']
@@ -628,14 +635,10 @@ class MultiModelManager:

                print(f"Loading model on demand: {model_name}")
                model_manager.load_model(model_name, backend_type=backend_type, **kwargs)
-            
-            # Add to models dict
                self.models[model_name] = model_manager
                self.current_model_key = model_name
-            
                print(f"Model loaded successfully: {model_name}")
                return model_manager
-            
            except Exception as e:
                print(f"Error loading model {model_name}: {e}")
                return None
@@ -655,6 +658,25 @@ class MultiModelManager:
            self.config[f"audio:{resolved_model}"] = self.config.pop(f"audio:{model_name}")
            print(f"Audio model '{model_name}' cached as: {resolved_model}")

+    def register_whisper_server(self, model_id: str, server_path: str, model_path: str = None,
+                                 port: int = 8744, gpu_device: int = 0, config: Dict = None):
+        """Register a whisper-server instance as an audio model."""
+        wsm = WhisperServerManager(server_path=server_path, port=port)
+        wsm._model_path = model_path
+        wsm._gpu_device = gpu_device
+        self.whisper_servers[model_id] = wsm
+        # Keep legacy single-instance reference pointing to the first one registered
+        if self.whisper_server is None:
+            self.whisper_server = wsm
+        # Register as allowed audio model with its config
+        cfg = config or {}
+        cfg.setdefault("load_mode", "on-request")
+        if model_id not in self.audio_models:
+            self.audio_models.append(model_id)
+        self.config[f"audio:{model_id}"] = cfg
+        print(f"Registered whisper-server audio model: {model_id} (server: {server_path})")
+        return wsm
+    
    def set_tts_model(self, model_name: str, config: Dict = None):
        """Set the text-to-speech model and download/cache it if needed."""
        self.tts_model = model_name
@@ -805,6 +827,43 @@ class MultiModelManager:

        return allowed

+    def get_registered_model_type(self, name: str) -> Optional[str]:
+        """
+        Return the type a model is registered under ("text", "image", "audio",
+        "tts", "vision", "video", "audio_gen", "embedding"), or None if unknown.
+        Short-name (filename) matching is used so full paths resolve correctly.
+        """
+        def _matches(registered: str) -> bool:
+            if name == registered:
+                return True
+            n_short = name.split("/")[-1] if "/" in name else name
+            r_short = registered.split("/")[-1] if "/" in registered else registered
+            return n_short == r_short
+
+        if self.default_model and _matches(self.default_model):
+            return "text"
+        for m in self.image_models:
+            if _matches(m):
+                return "image"
+        for m in self.audio_models:
+            if _matches(m):
+                return "audio"
+        if self.tts_model and _matches(self.tts_model):
+            return "tts"
+        for m in self.vision_models:
+            if _matches(m):
+                return "vision"
+        for m in self.video_models:
+            if _matches(m):
+                return "video"
+        for m in self.audio_gen_models:
+            if _matches(m):
+                return "audio_gen"
+        for m in self.embedding_models:
+            if _matches(m):
+                return "embedding"
+        return None
+
    def is_allowed_model(self, requested_or_resolved: str, model_type: str = None) -> bool:
        """
        Check if a model name (raw request value *or* resolved name) is one of
@@ -823,6 +882,15 @@ class MultiModelManager:
        if not requested_or_resolved:
            return False

+        # If a model_type is specified, reject models registered under a
+        # different type (e.g. an image GGUF requested via /v1/chat/completions).
+        if model_type:
+            registered_type = self.get_registered_model_type(requested_or_resolved)
+            if registered_type is not None and registered_type != model_type:
+                # "vision" models are acceptable for "text" endpoints (multimodal)
+                if not (model_type == "text" and registered_type == "vision"):
+                    return False
+
        # Quick check against the full set of allowed identifiers
        allowed = self.get_all_allowed_identifiers()
        if requested_or_resolved in allowed:
@@ -1365,9 +1433,26 @@ class MultiModelManager:
        # This prevents API callers from requesting arbitrary models that were not
        # specified on the command line (or registered as aliases).
        if not self.is_allowed_model(resolved_name, model_type):
-            # Also try the original requested_model value (before alias resolution)
-            # in case the caller used a valid alias that resolved to something we
-            # didn't recognise above (shouldn't happen, but be safe).
+            # Check if the model exists but is registered under a different type
+            registered_type = self.get_registered_model_type(resolved_name)
+            if registered_type is not None and registered_type != model_type:
+                endpoint_hint = {
+                    "image": "POST /v1/images/generations",
+                    "audio": "POST /v1/audio/transcriptions",
+                    "tts": "POST /v1/audio/speech",
+                    "video": "POST /v1/videos/generations",
+                }.get(registered_type, f"the {registered_type} endpoint")
+                print(f"Model type mismatch: '{resolved_name}' is a {registered_type} model, "
+                      f"requested via {model_type} endpoint")
+                return {
+                    'model_key': None,
+                    'model_name': None,
+                    'model_object': None,
+                    'config': {},
+                    'already_loaded': False,
+                    'error': (f"Model '{resolved_name}' is a {registered_type} model and cannot be used "
+                              f"for {model_type} generation. Use {endpoint_hint} instead."),
+                }
            allowed_ids = sorted(self.get_all_allowed_identifiers())
            print(f"Model validation failed: '{resolved_name}' is not an allowed model. "
                  f"Allowed models: {allowed_ids}")

--- a/codai/openai/__init__.py
+++ b/codai/openai/__init__.py
+# codai.openai — optional LiteLLM integration layer
--- a/codai/openai/litellm.py
+++ b/codai/openai/litellm.py
+# CoderAI - OpenAI-compatible API server
+# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+"""LiteLLM backend wrapper for codai.
+
+This module wraps the litellm library so that the text endpoint can forward
+requests to any model provider supported by LiteLLM (Ollama, OpenAI, Anthropic,
+etc.) while still returning responses in the standard OpenAI format.
+"""
+
+import re
+import time
+import uuid
+from typing import Any, AsyncGenerator, Dict, List, Optional
+
+try:
+    import litellm
+    litellm.drop_params = True  # silently drop unsupported params
+    LITELLM_AVAILABLE = True
+except ImportError:
+    LITELLM_AVAILABLE = False
+
+
+class LiteLLMBackend:
+    """Wraps litellm.acompletion with the interface expected by codai's text endpoint."""
+
+    def __init__(
+        self,
+        model: str,
+        api_key: str,
+        api_base: Optional[str],
+        context_window: int = 8192,
+        model_manager=None,
+    ):
+        self.model = model
+        self.api_key = api_key
+        self.api_base = api_base
+        self.context_window = context_window
+        self.model_manager = model_manager
+
+    def _litellm_model(self, model: str) -> str:
+        """Return the model string in the format litellm expects."""
+        if model.startswith('ollama:'):
+            return model  # litellm already understands "ollama/<name>"
+        return model
+
+    async def chat_completion(
+        self,
+        messages: List[Dict],
+        model: str,
+        temperature: Optional[float] = 1.0,
+        top_p: Optional[float] = 1.0,
+        max_tokens: Optional[int] = None,
+        stop=None,
+        tools=None,
+        tool_choice=None,
+        stream: bool = False,
+        tool_parser=None,
+    ):
+        """Call litellm.acompletion and return either a full response dict or
+        an async generator of chunk dicts (when stream=True).
+        """
+        if not LITELLM_AVAILABLE:
+            raise RuntimeError("litellm is not installed. Run: pip install litellm")
+
+        kwargs: Dict[str, Any] = {
+            "model": self._litellm_model(model),
+            "messages": messages,
+            "api_key": self.api_key,
+            "stream": stream,
+        }
+        if self.api_base:
+            kwargs["api_base"] = self.api_base
+        if temperature is not None:
+            kwargs["temperature"] = temperature
+        if top_p is not None:
+            kwargs["top_p"] = top_p
+        if max_tokens is not None:
+            kwargs["max_tokens"] = max_tokens
+        if stop:
+            kwargs["stop"] = stop if isinstance(stop, list) else [stop]
+        if tools:
+            kwargs["tools"] = tools
+        if tool_choice:
+            kwargs["tool_choice"] = tool_choice
+
+        if stream:
+            return self._stream(kwargs)
+        else:
+            response = await litellm.acompletion(**kwargs)
+            return response.model_dump() if hasattr(response, 'model_dump') else dict(response)
+
+    async def _stream(self, kwargs: Dict) -> AsyncGenerator[Dict, None]:
+        response = await litellm.acompletion(**kwargs)
+        async for chunk in response:
+            yield chunk.model_dump() if hasattr(chunk, 'model_dump') else dict(chunk)
+
+    def get_rate_limit_headers(self, prompt_tokens: int, completion_tokens: int) -> Dict:
+        return {
+            "x-ratelimit-limit-requests": "1000",
+            "x-ratelimit-remaining-requests": "999",
+            "x-ratelimit-limit-tokens": str(self.context_window),
+            "x-ratelimit-remaining-tokens": str(
+                max(0, self.context_window - prompt_tokens - completion_tokens)
+            ),
+        }
+
+    # ------------------------------------------------------------------
+    # Qwen-specific helpers (tool calls embedded in <tool_call>…</tool_call>)
+    # ------------------------------------------------------------------
+
+    _QWEN_TOOL_PATTERN = re.compile(
+        r'<tool_call>\s*(\{.*?\})\s*</tool_call>', re.DOTALL
+    )
+    _QWEN_TAG_PATTERN = re.compile(
+        r'<tool_call>.*?</tool_call>', re.DOTALL
+    )
+
+    def parse_qwen_tool_calls(self, content: str) -> List[Dict]:
+        """Extract tool calls embedded as <tool_call>{…}</tool_call> tags."""
+        import json
+        calls = []
+        for m in self._QWEN_TOOL_PATTERN.finditer(content):
+            try:
+                data = json.loads(m.group(1))
+                calls.append({
+                    "id": f"call_{uuid.uuid4().hex[:8]}",
+                    "type": "function",
+                    "function": {
+                        "name": data.get("name", ""),
+                        "arguments": json.dumps(data.get("arguments", {})),
+                    },
+                })
+            except (json.JSONDecodeError, KeyError):
+                continue
+        return calls
+
+    def strip_tool_tags(self, content: str) -> str:
+        """Remove <tool_call>…</tool_call> blocks from content."""
+        return self._QWEN_TAG_PATTERN.sub('', content).strip()
+
+
+def get_litellm_backend(
+    model: str,
+    api_key: str,
+    api_base: Optional[str] = None,
+    context_window: int = 8192,
+    model_manager=None,
+) -> LiteLLMBackend:
+    """Return a LiteLLMBackend instance for the given model."""
+    return LiteLLMBackend(
+        model=model,
+        api_key=api_key,
+        api_base=api_base,
+        context_window=context_window,
+        model_manager=model_manager,
+    )
--- a/codai/pydantic/videorequest.py
+++ b/codai/pydantic/videorequest.py
@@ -57,9 +57,14 @@ class VideoGenerationRequest(BaseModel):
    camera_motion: Optional[str] = None     # zoom-in | zoom-out | pan-left | pan-right | tilt-up | tilt-down | rotate

    # ── Character consistency ─────────────────────────────────────────────
-    character_references: Optional[List[str]] = None  # list of base64/URL reference images
+    # Each entry: {"name": "Alice", "images": ["b64...", ...]}
+    characters: Optional[List[dict]] = None
+    # Legacy flat list of base64/URL reference images (still accepted)
+    character_references: Optional[List[str]] = None
    character_strength: Optional[float] = 0.8
    character_names: Optional[List[str]] = None       # optional names per reference
+    # Named saved profiles to load (resolved server-side)
+    character_profiles: Optional[List[str]] = None

    # ── Audio generation / manipulation ──────────────────────────────────
    add_audio: Optional[bool] = False

--- a/codai/queue/manager.py
+++ b/codai/queue/manager.py
@@ -33,6 +33,12 @@ class QueueManager:
        self.model_loading: bool = False
        self.model_name: Optional[str] = None
        self.lock = asyncio.Lock()
+        self.max_size: int = 6
+
+    async def is_full(self) -> bool:
+        """Return True if the queue has reached max_size."""
+        async with self.lock:
+            return len(self.waiting_requests) >= self.max_size
    
    async def add_waiting(self, request_id: str) -> None:
        """Add a request to the waiting queue."""

--- a/videogen @ 04778e17
+++ b/videogen @ 04778e17
-Subproject commit 04778e172a9a83d0778f566045f995828c6c3556
--- a/requirements-nvidia.txt
+++ b/requirements-nvidia.txt
@@ -32,6 +32,16 @@ realesrgan>=0.3.0
 basicsr>=1.4.2
 timm>=0.9.0

+# Voice cloning (F5-TTS zero-shot voice cloning)
+f5-tts>=1.1.0
+
+# Voice conversion / singing voice conversion (Seed-VC — preserves pitch/melody)
+seed-vc>=0.4.0
+
+# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
+insightface>=0.7.3
+onnxruntime-gpu>=1.20.0        # GPU-accelerated ONNX runtime for insightface
+
 # Optional: for better performance with NVIDIA GPUs
 bitsandbytes>=0.41.0
 sentencepiece>=0.1.99

--- a/requirements-vulkan.txt
+++ b/requirements-vulkan.txt
@@ -18,3 +18,10 @@ huggingface-hub>=0.19.0
 # Optional: Audio transcription without PyTorch (whispercpp)
 # Note: faster-whisper requires PyTorch, but whispercpp works without it
 whispercpp>=0.0.17  # For GGUF-based Whisper transcription without PyTorch
+
+# Voice cloning (F5-TTS zero-shot voice cloning)
+f5-tts>=1.1.0
+
+# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
+insightface>=0.7.3
+onnxruntime>=1.20.0            # CPU ONNX runtime (use onnxruntime-gpu for GPU acceleration)
--- a/requirements.txt
+++ b/requirements.txt
@@ -75,6 +75,16 @@ timm>=0.9.0                    # vision model backbones (depth/segment endpoints
 #   pip install audiocraft
 # AudioLDM2 is available via diffusers (already listed above)

+# Voice cloning (F5-TTS zero-shot voice cloning)
+f5-tts>=1.1.0
+
+# Voice conversion / singing voice conversion (Seed-VC — preserves pitch/melody)
+seed-vc>=0.4.0
+
+# Face swap (insightface INSwapper — downloads inswapper_128.onnx on first use)
+insightface>=0.7.3
+onnxruntime-gpu>=1.20.0        # GPU-accelerated ONNX runtime for insightface
+
 # Optional: for better performance
 # bitsandbytes>=0.41.0  # for 4-bit/8-bit quantization
 # sentencepiece>=0.1.99  # for some tokenizers