# VideoGen - Universal Video Generation Toolkit

**Copyleft © 2026 Stefy <stefy@nexlab.net>**

A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Video (T2V), Image-to-Video (I2V), Text-to-Image (T2I), Image-to-Image (I2I), Video-to-Video (V2V), Video-to-Image (V2I), and 2D-to-3D conversion with audio synthesis, synchronization, lip-sync, dubbing, and translation capabilities.

---

## Features

### Video Generation
- **Text-to-Video (T2V)**: Generate videos from text prompts
- **Image-to-Video (I2V)**: Animate static images
- **Text-to-Image (T2I)**: Generate high-quality images
- **Image-to-Image (I2I)**: Transform existing images
- **Video-to-Video (V2V)**: Style transfer and filters for videos
- **Video-to-Image (V2I)**: Extract frames and keyframes from videos

### Video Processing
- **Video Upscaling**: AI-powered video upscaling (ESRGAN, Real-ESRGAN, SwinIR)
- **Video Filters**: Grayscale, sepia, blur, sharpen, contrast, speed, slow-mo, reverse, fade, denoise, stabilize
- **Video Concatenation**: Join multiple videos
- **Frame Extraction**: Extract single frames, keyframes, or all frames

### 2D-to-3D Conversion
- **3D Side-by-Side (SBS)**: Convert 2D videos to 3D SBS format for VR headsets and 3D TVs
- **3D Anaglyph**: Convert to red/cyan anaglyph format for 3D glasses
- **VR 360**: Convert 2D videos to VR 360 equirectangular format
- **Depth Estimation**: AI-powered depth map generation

### Audio Capabilities
- **Text-to-Speech (TTS)**: Multiple voices via Bark and Edge-TTS
- **Music Generation**: MusicGen integration for background music
- **Audio Sync**: Match audio duration to video (stretch, trim, pad, loop)
- **Lip Sync**: Wav2Lip and SadTalker integration

### Video Dubbing & Translation
- **Video Dubbing**: Translate and dub videos while preserving original voice
- **Voice Cloning**: Preserve speaker's voice in translated video
- **Automatic Subtitles**: Generate subtitles using Whisper
- **Subtitle Translation**: Translate subtitles to 20+ languages
- **Subtitle Burning**: Burn subtitles directly into video

### Model Support
- **Small Models** (<16GB VRAM): Wan 1.3B, Zeroscope, ModelScope
- **Medium Models** (16-30GB VRAM): Wan 14B, CogVideoX, Mochi
- **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo
- **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina

### Character Consistency
- **Character Profiles**: Save and reuse character references across generations
- **IP-Adapter**: Image prompt adapter for consistent character generation
- **InstantID**: Face identity preservation for consistent faces
- **Reference Images**: Use multiple reference images for character consistency

### Smart Features
- **Auto Mode**: Automatic model selection and configuration
- **NSFW Detection**: Automatic content classification
- **Prompt Splitting**: Intelligent I2V prompt separation
- **Time Estimation**: Hardware-aware generation time prediction
- **Multi-GPU**: Distributed generation across multiple GPUs
- **Auto-Disable**: Models that fail 3 times are auto-disabled
- **Memory Management**: Automatic chunking for long videos and low VRAM

### User Interfaces
- **Command Line**: Full-featured CLI with all options
- **Web Interface**: Modern web UI with real-time progress updates
- **MCP Server**: Model Context Protocol wrapper for AI agents

---

## Installation

### Core Dependencies
```bash
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --break-system-packages
pip install git+https://github.com/huggingface/diffusers.git --break-system-packages
pip install git+https://github.com/huggingface/transformers.git --break-system-packages
pip install --upgrade accelerate xformers spandrel psutil ffmpeg-python ftfy --break-system-packages
```

### Audio Features (Optional)
```bash
pip install scipy soundfile librosa --break-system-packages
pip install git+https://github.com/suno-ai/bark.git --break-system-packages
pip install edge-tts --break-system-packages
pip install audiocraft
```

### Lip Sync (Optional)
```bash
pip install opencv-python face-recognition dlib --break-system-packages
git clone https://github.com/Rudrabha/Wav2Lip.git
```

### Web Interface
```bash
pip install flask flask-cors flask-socketio eventlet
```

### MCP Server (For AI Agents)
```bash
pip install mcp
```

---

## Quick Start

### First-Time Setup

**IMPORTANT**: Before using VideoGen, update the model database:

```bash
python3 videogen --update-models
```

This fetches the latest model list from HuggingFace and populates the local database.

### Basic Usage

```bash
# Simple video generation
python3 videogen --model wan_1.3b_t2v --prompt "a cat playing piano" --output cat_piano

# Auto mode - let the script decide everything
python3 videogen --auto --prompt "a beautiful sunset over the ocean"

# Generate with audio
python3 videogen --model wan_14b_t2v --prompt "epic battle scene" \
  --generate_audio --audio_type music --sync_audio --output battle
```

### Image-to-Video

```bash
# Animate an existing image
python3 videogen --model svd_xt_1.1 --image my_photo.jpg \
  --prompt "add subtle motion" --output animated

# I2V with auto-generated image
python3 videogen --image_to_video --model svd_xt_1.1 \
  --image_model flux_dev --prompt "cinematic portrait" \
  --prompt_animation "gentle head movement" --output portrait
```

### With Lip Sync

```bash
python3 videogen --image_to_video --model svd_xt_1.1 \
  --image_model flux_dev --prompt "person speaking" \
  --generate_audio --audio_type tts \
  --audio_text "Hello, welcome to my channel" \
  --lip_sync --output speaker
```

### Character Consistency

VideoGen supports character consistency across multiple generations using IP-Adapter and InstantID.

```bash
# Create a character profile from reference images
python3 videogen --create-character my_character \
  --character-images ref1.jpg ref2.jpg ref3.jpg \
  --character-desc "A young woman with red hair"

# List saved character profiles
python3 videogen --list-characters

# Generate with character consistency
python3 videogen --model flux_dev \
  --character my_character \
  --prompt "my_character walking in a park" \
  --output character_park

# Use IP-Adapter directly with reference images
python3 videogen --model sdxl_base \
  --ipadapter --ipadapter-scale 0.8 \
  --reference-images ref1.jpg ref2.jpg \
  --prompt "a person reading a book" \
  --output reading

# Use InstantID for face consistency
python3 videogen --model sdxl_base \
  --ipadapter --instantid \
  --reference-images face_ref.jpg \
  --prompt "portrait of a person smiling" \
  --output portrait
```

---

## AI Agent Integration

### MCP Server

VideoGen includes an MCP (Model Context Protocol) server for seamless integration with AI agents like Claude:

```bash
# Start the MCP server
python3 videogen_mcp_server.py
```

Add to Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):

```json
{
  "mcpServers": {
    "videogen": {
      "command": "python3",
      "args": ["/path/to/videogen_mcp_server.py"]
    }
  }
}
```

### Available MCP Tools

| Tool | Description |
|------|-------------|
| `videogen_generate` | Generate video with auto mode |
| `videogen_generate_video` | Text-to-Video generation |
| `videogen_generate_image` | Text-to-Image generation |
| `videogen_animate_image` | Image-to-Video animation |
| `videogen_transform_image` | Image-to-Image transformation |
| `videogen_generate_with_audio` | Video with TTS or music |
| `videogen_list_models` | List available models |
| `videogen_show_model` | Show model details |
| `videogen_update_models` | Update model database |
| `videogen_search_models` | Search HuggingFace |
| `videogen_add_model` | Add model to database |
| `videogen_list_tts_voices` | List TTS voices |

### Skill Documentation

See [SKILL.md](SKILL.md) for comprehensive AI agent integration guide including:
- Quick reference commands
- Common use cases
- Model selection guide
- Error handling
- Programmatic usage examples

---

## Web Interface

VideoGen includes a modern web interface for easy access to all features:

### Starting the Web Server

```bash
python3 webapp.py --port 5000 --host 0.0.0.0
```

Then open `http://localhost:5000` in your browser.

### Web Interface Features

- **All Generation Modes**: T2V, I2V, T2I, I2I, V2V, Upscale, Dubbing, Subtitles
- **Real-time Progress**: Live progress updates with output log streaming
- **File Upload/Download**: Upload images, videos, audio; download generated content
- **Model Selection**: Browse and select from all available models
- **Job Management**: View, cancel, retry, and track all generation jobs
- **Gallery**: Browse and download all generated files
- **Responsive Design**: Works on desktop and mobile devices

### Web Interface Screenshot

![VideoGen Web Interface](screenshot.png)

The web interface provides:
1. **Generate Tab**: Main generation form with all options
2. **Jobs Tab**: Real-time job monitoring with progress bars
3. **Gallery Tab**: Browse and download generated content
4. **Settings Tab**: Configuration and about information

---

## Documentation

- **[EXAMPLES.md](EXAMPLES.md)**: Comprehensive command-line examples for all features
- **[SKILL.md](SKILL.md)**: AI agent integration guide
- **Built-in help**: `python3 videogen --help`
- **Model list**: `python3 videogen --model-list`
- **TTS voices**: `python3 videogen --tts-list`

---

## Model Management

```bash
# Update model database (run this first!)
python3 videogen --update-models

# List available models
python3 videogen --model-list

# List models by VRAM requirement
python3 videogen --model-list --low-vram    # ≤16GB
python3 videogen --model-list --high-vram   # >30GB
python3 videogen --model-list --huge-vram   # >55GB

# Search HuggingFace for models
python3 videogen --search-models "video generation"

# Add a model
python3 videogen --add-model stabilityai/stable-video-diffusion-img2vid-xt-1.1 --name svd_xt

# Show model details
python3 videogen --show-model 1
```

---

## VRAM Management

```bash
# Limit VRAM usage
python3 videogen --model wan_14b_t2v --prompt "test" --vram_limit 16

# Offloading strategies
python3 videogen --model wan_14b_t2v --prompt "test" --offload_strategy sequential

# Low RAM mode
python3 videogen --model wan_14b_t2v --prompt "test" --low_ram_mode
```

---

## Distributed Generation

```bash
# Multi-GPU distributed generation
python3 videogen --model hunyuanvideo --prompt "epic scene" \
  --length 30 --distribute --vram_limit 20
```

---

## Configuration

Models are stored in `~/.config/videogen/models.json`

Set environment variables:
```bash
export HF_TOKEN=your_token_here        # For gated models
export HF_HOME=/path/to/cache          # Custom cache directory
export CUDA_VISIBLE_DEVICES=0,1        # GPU selection
```

---

## Project Structure

```
videogen/
├── videogen              # Main script
├── webapp.py             # Web interface server
├── templates/            # HTML templates
│   └── index.html        # Main web UI
├── static/               # Static assets
│   ├── css/style.css     # Styles
│   └── js/app.js         # JavaScript
├── videogen_mcp_server.py # MCP server for AI agents
├── README.md             # This file
├── EXAMPLES.md           # Comprehensive examples
├── SKILL.md              # AI agent integration guide
├── LICENSE.md            # GPLv3 License
└── requirements.txt      # Python dependencies
```

---

## License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

See [LICENSE.md](LICENSE.md) for the full license text.

---

## Copyleft

**VideoGen - Universal Video Generation Toolkit**
Copyright © 2026 Stefy <stefy@nexlab.net>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

---

## Contributing

Contributions are welcome! Please feel free to submit pull requests.

---

## Support

For issues and questions, please open an issue on the repository or contact stefy@nexlab.net.