# VideoGen AI Agent Skill

**Copyleft © 2026 Stefy <stefy@nexlab.net>**

This skill enables AI agents to use the VideoGen toolkit for video, image, and audio generation.

---

## Overview

VideoGen is a universal video generation toolkit that supports:
- **Text-to-Video (T2V)**: Generate videos from text prompts
- **Image-to-Video (I2V)**: Animate static images
- **Text-to-Image (T2I)**: Generate images from text
- **Image-to-Image (I2I)**: Transform existing images
- **Video-to-Video (V2V)**: Style transfer and filters for videos
- **Video-to-Image (V2I)**: Extract frames and keyframes from videos
- **2D-to-3D Conversion**: Convert 2D videos to 3D SBS, anaglyph, or VR 360
- **Video Upscaling**: AI-powered video upscaling
- **Audio Generation**: TTS and music generation
- **Lip Sync**: Synchronize lip movements with audio
- **Video Dubbing**: Translate and dub videos with voice preservation
- **Subtitle Generation**: Create and translate subtitles automatically

---

## ⚠️ IMPORTANT: First-Time Setup

Before using VideoGen, you MUST update the model database:

```bash
python3 videogen --update-models
```

This command:
- Fetches the latest model list from HuggingFace
- Populates the local model database
- Enables model discovery and selection

**Run this command once before using any other VideoGen features.**

---

## Quick Reference for AI Agents

### Basic Commands

```bash
# ⚠️ FIRST: Update model database (required on first use)
python3 videogen --update-models

# Simple video generation
python3 videogen --model <model_name> --prompt "<prompt>" --output <output_name>

# Auto mode (recommended for AI agents)
python3 videogen --auto --prompt "<prompt>" --output <output_name>

# List available models
python3 videogen --model-list

# Show model details
python3 videogen --show-model <model_id_or_name>
```

### Generation Types

| Type | Command Pattern | Use Case |
|------|-----------------|----------|
| T2V | `--model t2v_model --prompt "..."` | Generate video from text |
| I2V | `--image_to_video --model i2v_model --prompt "..."` | Animate an image |
| T2I | `--model t2i_model --prompt "..." --output image.png` | Generate image |
| I2I | `--image-to-image --image input.png --prompt "..."` | Transform image |
| V2V | `--video input.mp4 --video-to-video --prompt "..."` | Style transfer on video |
| V2I | `--video input.mp4 --extract-keyframes` | Extract frames from video |
| 3D | `--video input.mp4 --convert-3d-sbs` | Convert 2D to 3D |

### Model Management

#### List and Manage Models

```bash
# List all available models (with auto mode status)
python3 videogen --model-list

# Filter models
python3 videogen --model-list --t2v-only      # Text-to-Video models
python3 videogen --model-list --i2v-only      # Image-to-Video models
python3 videogen --model-list --t2i-only      # Text-to-Image models
python3 videogen --model-list --v2v-only      # Video-to-Video models
python3 videogen --model-list --v2i-only      # Video-to-Image models
python3 videogen --model-list --3d-only       # 2D-to-3D models
python3 videogen --model-list --tts-only      # TTS models
python3 videogen --model-list --audio-only    # Audio models
python3 videogen --model-list --low-vram      # ≤16GB VRAM
python3 videogen --model-list --high-vram    # >30GB VRAM
python3 videogen --model-list --huge-vram    # >55GB VRAM
python3 videogen --model-list --nsfw-friendly

# Batch output (for scripts)
python3 videogen --model-list --model-list-batch

# Disable a model from auto selection
python3 videogen --disable-model <ID_or_name>

# Enable a model for auto selection
python3 videogen --enable-model <ID_or_name>
```

#### Cache Management

```bash
# List cached models
python3 videogen --list-cached-models

# Remove specific cached model
python3 videogen --remove-cached-model <model_id>

# Clear entire cache
python3 videogen --clear-cache
```

#### VRAM Management

```bash
# Allow models larger than VRAM using system RAM (implies sequential offload)
python3 videogen --auto --prompt "..." --allow-bigger-models

# Specify offload strategy explicitly
python3 videogen --auto --prompt "..." --offload_strategy sequential --vram_limit 16

# NEW: Balanced strategy - maximizes VRAM usage, only offloads if necessary
python3 videogen --auto --prompt "..." --offload_strategy balanced

# Limit VRAM usage
python3 videogen --model wan_14b_t2v --prompt "..." --vram_limit 16

# Low RAM mode
python3 videogen --model wan_14b_t2v --prompt "..." --low_ram_mode
```

#### Output Options

```bash
# Specify output directory for batch processing
python3 videogen --model wan_14b_t2v --prompt "..." --output-dir /path/to/output

# Auto-confirm prompts (useful for scripts)
python3 videogen --model wan_14b_t2v --prompt "..." --yes
```

### Auto Mode

Auto mode is the easiest way for AI agents to generate content:

```bash
# First, ensure models are updated
python3 videogen --update-models

# Then use auto mode
python3 videogen --auto --prompt "<description>"
```

Auto mode automatically:
1. Detects the generation type from the prompt
2. Detects NSFW content
3. Selects the best model for available VRAM
4. Configures all settings
5. Prints the command line for reproduction

---

## Common Use Cases

### 1. Generate a Simple Video

```bash
python3 videogen --auto --prompt "a cat playing piano" --output cat_piano
```

### 2. Generate a Video with Narration

```bash
python3 videogen --auto --prompt "a woman speaking to camera" \
  --generate_audio --audio_type tts \
  --audio_text "Welcome to my channel" \
  --output speaker
```

### 3. Animate an Existing Image

```bash
python3 videogen --model svd_xt_1.1 --image photo.jpg \
  --prompt "add subtle motion" --output animated
```

### 4. Generate an Image

```bash
python3 videogen --model flux_dev --prompt "beautiful landscape" \
  --output landscape.png
```

### 5. Transform an Image

```bash
python3 videogen --model flux_dev --image-to-image \
  --image photo.jpg --prompt "make it look like a painting" \
  --output painted.png
```

### 6. Generate Video with Music

```bash
python3 videogen --model wan_14b_t2v --prompt "epic battle scene" \
  --generate_audio --audio_type music \
  --audio_text "epic orchestral music" \
  --sync_audio --output battle
```

### 7. Generate Video with Lip Sync

```bash
python3 videogen --image_to_video --model svd_xt_1.1 \
  --image_model flux_dev --prompt "person speaking" \
  --generate_audio --audio_type tts \
  --audio_text "Hello world" \
  --lip_sync --output speaker
```

### 8. Video-to-Video Style Transfer

```bash
# Apply style transfer to a video
python3 videogen --video input.mp4 --video-to-video \
  --prompt "make it look like a watercolor painting" \
  --v2v-strength 0.7 --output styled
```

### 9. Apply Video Filter

```bash
# Apply grayscale filter
python3 videogen --video input.mp4 --video-filter grayscale --output gray

# Apply slow motion
python3 videogen --video input.mp4 --video-filter slow \
  --filter-params "factor=0.5" --output slowmo

# Apply blur
python3 videogen --video input.mp4 --video-filter blur \
  --filter-params "radius=10" --output blurred
```

### 10. Extract Frames from Video

```bash
# Extract keyframes
python3 videogen --video input.mp4 --extract-keyframes --frames-dir keyframes

# Extract single frame at timestamp
python3 videogen --video input.mp4 --extract-frame --timestamp 5.5 --output frame.png

# Extract all frames
python3 videogen --video input.mp4 --extract-frames --v2v-max-frames 100 --frames-dir all_frames
```

### 11. Create Video Collage

```bash
# Create 4x4 thumbnail grid
python3 videogen --video input.mp4 --video-collage \
  --collage-grid 4x4 --output collage.png
```

### 12. Upscale Video

```bash
# 2x upscale using ffmpeg
python3 videogen --video input.mp4 --upscale-video \
  --upscale-factor 2.0 --output upscaled

# 4x upscale using AI (ESRGAN)
python3 videogen --video input.mp4 --upscale-video \
  --upscale-factor 4.0 --upscale-method esrgan --output upscaled_4k
```

### 13. Convert 2D to 3D

```bash
# Convert to side-by-side 3D for VR
python3 videogen --video input.mp4 --convert-3d-sbs \
  --depth-method ai --output 3d_sbs

# Convert to anaglyph 3D (red/cyan glasses)
python3 videogen --video input.mp4 --convert-3d-anaglyph --output anaglyph

# Convert to VR 360 format
python3 videogen --video input.mp4 --convert-vr --output vr360
```

### 14. Concatenate Videos

```bash
# Join multiple videos
python3 videogen --concat-videos video1.mp4 video2.mp4 video3.mp4 --output joined
```

---

## Model Selection Guide

### By VRAM

| VRAM | Recommended Models |
|------|-------------------|
| <16GB | `wan_1.3b_t2v`, `zeroscope_v2_576w` |
| 16-30GB | `wan_14b_t2v`, `cogvideox_2b`, `mochi_1_preview` |
| 30-50GB | `allegro`, `hunyuanvideo` |
| 50GB+ | `open_sora_1_2`, `step_video_t2v` |

### By Task

| Task | Recommended Models |
|------|-------------------|
| Fast T2V | `wan_1.3b_t2v` |
| Quality T2V | `wan_14b_t2v`, `mochi_1_preview` |
| I2V | `svd_xt_1.1`, `wan_14b_i2v` |
| T2I | `flux_dev`, `sdxl_base` |
| Anime | `pony_v6` |

---

## Video Dubbing & Translation

VideoGen supports video dubbing with voice preservation and automatic subtitle generation.

### Transcribe Video Audio

```bash
# Transcribe audio from video
python3 videogen --video input.mp4 --transcribe

# Use larger Whisper model for better accuracy
python3 videogen --video input.mp4 --transcribe --whisper-model large

# Specify source language
python3 videogen --video input.mp4 --transcribe --source-lang en

# Audio chunking strategies for long videos
python3 videogen --video input.mp4 --transcribe --audio-chunk overlap     # 60s chunks with 2s overlap
python3 videogen --video input.mp4 --transcribe --audio-chunk word-boundary  # Split at word boundaries
python3 videogen --video input.mp4 --transcribe --audio-chunk vad           # Skip silence with VAD
```

### Create Subtitles

```bash
# Create SRT subtitles from video
python3 videogen --video input.mp4 --create-subtitles

# Create translated subtitles
python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang es

# Burn subtitles into video
python3 videogen --video input.mp4 --create-subtitles --burn-subtitles --output subtitled

# Burn translated subtitles
python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang fr --burn-subtitles --output french_subtitled
```

### Dub Video with Translation

```bash
# Translate and dub video (preserves voice)
python3 videogen --video input.mp4 --dub-video --target-lang es --output spanish_dubbed

# Dub without voice cloning (use standard TTS)
python3 videogen --video input.mp4 --dub-video --target-lang fr --no-voice-clone --output french_dubbed

# Specify TTS voice for dubbing
python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge_male_de --output german_dubbed
```

### Supported Languages

| Code | Language | Code | Language |
|------|----------|------|----------|
| en | English | es | Spanish |
| fr | French | de | German |
| it | Italian | pt | Portuguese |
| ru | Russian | zh | Chinese |
| ja | Japanese | ko | Korean |
| ar | Arabic | hi | Hindi |
| nl | Dutch | pl | Polish |
| tr | Turkish | vi | Vietnamese |
| th | Thai | id | Indonesian |
| sv | Swedish | uk | Ukrainian |

---

## Character Consistency

VideoGen supports character consistency across multiple generations using IP-Adapter, InstantID, and Character Profiles.

### Create Character Profile

```bash
# Create a character profile from reference images
python3 videogen --create-character alice \
  --character-images ref1.jpg ref2.jpg ref3.jpg \
  --character-desc "young woman with blue eyes and blonde hair"

# List all saved character profiles
python3 videogen --list-characters

# Show details of a character profile
python3 videogen --show-character alice

# Delete a character profile
python3 videogen --delete-character alice
```

### Generate with Character

```bash
# Generate image with character consistency
python3 videogen --model flux_dev \
  --character alice \
  --prompt "alice walking in a park" \
  --output alice_park.png

# Generate video with character (I2V)
python3 videogen --image_to_video --model svd_xt_1.1 \
  --image_model flux_dev \
  --character alice \
  --prompt "alice smiling at camera" \
  --prompt_animation "subtle head movement" \
  --output alice_animated
```

### IP-Adapter Direct Usage

```bash
# Use IP-Adapter with reference images directly
python3 videogen --model flux_dev \
  --ipadapter --ipadapter-scale 0.8 \
  --reference-images ref1.jpg ref2.jpg \
  --prompt "the person in a business suit" \
  --output business.png

# Use InstantID for face identity
python3 videogen --model flux_dev \
  --ipadapter --instantid \
  --reference-images face_ref.jpg \
  --prompt "portrait of the person smiling" \
  --output portrait.png
```

### Character Consistency Tips

1. **Use multiple reference images** (3-5) for better consistency
2. **IP-Adapter scale**: 0.7-0.9 for good balance (higher = more similar)
3. **InstantID** is better for face identity preservation
4. **Character profiles** are reusable across sessions
5. **Combine IP-Adapter + InstantID** for best results

---

## Output Files

VideoGen creates these output files:

| File | Description |
|------|-------------|
| `<output>.mp4` | Generated video |
| `<output>.png` | Generated image (if T2I mode) |
| `<output>_init.png` | Initial image (I2V mode) |
| `<output>_tts.wav` | TTS audio |
| `<output>_music.wav` | Generated music |
| `<output>_synced.mp4` | Audio-synced video |
| `<output>_lipsync.mp4` | Lip-synced video |
| `<output>_upscaled.mp4` | Upscaled video |
| `<output>_<lang>.srt` | Subtitle file |
| `<output>_dubbed.mp4` | Dubbed video |

---

## Error Handling

### Common Errors

1. **Model not found**
   - Run `python3 videogen --update-models` to update the database
   - Use `--model-list` to see available models

2. **Out of memory**
   - Use a smaller model
   - Add `--offload_strategy sequential` (for very large models)
   - Use `--offload_strategy balanced` (recommended - maximizes VRAM usage)
   - Reduce resolution with `--width` and `--height`

3. **Pipeline class not found**
   - Update diffusers: `pip install --upgrade git+https://github.com/huggingface/diffusers.git`

---

## AI Agent Integration Tips

### 1. Always Use Auto Mode First

Auto mode handles most cases automatically:
```bash
python3 videogen --auto --prompt "<user's request>"
```

### 2. Parse the Output

Auto mode prints:
- Detection results (type, NSFW, motion)
- Selected model
- Full command line for reproduction

### 3. Handle User Preferences

If the user specifies:
- **Quality**: Use larger models like `wan_14b_t2v`, `mochi_1_preview`
- **Speed**: Use smaller models like `wan_1.3b_t2v`
- **Resolution**: Set `--width` and `--height`
- **Duration**: Set `--length`

### 4. NSFW Content

Auto mode detects NSFW content automatically. For explicit requests:
```bash
python3 videogen --model <model> --prompt "<prompt>" --no_filter
```

### 5. Color Correction

Some models may produce videos with incorrect colors. Two manual fixes are available:

**Color inversion (luminosity)** - Dark areas appear light, light areas appear dark:
```bash
python3 videogen --model wan2_2_i2v_general_nsfw_lora --image input.png --prompt "..." --invert_colors
```

**BGR/RGB channel swap** - Colors have wrong tint (red appears blue, etc.):
```bash
python3 videogen --model wan2_2_i2v_general_nsfw_lora --image input.png --prompt "..." --swap_bgr
```

**Note:** There is no standard way to automatically detect RGB vs BGR output from diffusion models. The VAE output format is model-specific and not declared in metadata. Users must visually inspect output and apply corrections as needed.

### 6. Reproducibility

Always capture and store the seed for reproducibility:
```bash
python3 videogen --model <model> --prompt "<prompt>" --seed 42
```

---

## Example Workflows

### Workflow 1: Content Creator

```bash
# Step 1: Generate intro video with narration
python3 videogen --auto --prompt "professional intro animation with logo" \
  --generate_audio --audio_type tts \
  --audio_text "Welcome to my channel" \
  --output intro

# Step 2: Generate main content
python3 videogen --auto --prompt "tutorial scene with presenter" \
  --length 30 --output main_content

# Step 3: Generate outro
python3 videogen --auto --prompt "end screen with subscribe button" \
  --output outro
```

### Workflow 2: Social Media

```bash
# Generate short vertical video
python3 videogen --model wan_14b_t2v \
  --prompt "trending dance move" \
  --width 720 --height 1280 \
  --length 10 --fps 30 \
  --output social_clip
```

### Workflow 3: Music Video

```bash
# Generate video with synchronized music
python3 videogen --model wan_14b_t2v \
  --prompt "cinematic music video scene" \
  --generate_audio --audio_type music \
  --audio_text "upbeat pop song with catchy melody" \
  --music_model large \
  --sync_audio \
  --output music_video
```

---

## Programmatic Usage

### Python Integration

```python
import subprocess
import json

def generate_video(prompt, output="output", model=None, auto=True):
    """Generate a video using VideoGen"""
    cmd = ["python3", "videogen"]
    
    if auto:
        cmd.append("--auto")
    elif model:
        cmd.extend(["--model", model])
    
    cmd.extend(["--prompt", prompt, "--output", output])
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout, result.returncode

def list_models(filter_type=None):
    """List available models"""
    cmd = ["python3", "videogen", "--model-list"]
    if filter_type == "i2v":
        cmd.append("--i2v-only")
    elif filter_type == "low_vram":
        cmd.append("--low-vram")
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

def get_model_info(model_id):
    """Get details for a specific model"""
    cmd = ["python3", "videogen", "--show-model", str(model_id)]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

def update_models():
    """Update model database from HuggingFace"""
    cmd = ["python3", "videogen", "--update-models"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout
```

### JSON Output (Future Feature)

VideoGen can be extended to output JSON for easier parsing:

```python
# Future: Add --json flag for machine-readable output
cmd = ["python3", "videogen", "--auto", "--prompt", prompt, "--json"]
result = subprocess.run(cmd, capture_output=True, text=True)
data = json.loads(result.stdout)
print(f"Video saved to: {data['output_file']}")
print(f"Seed: {data['seed']}")
print(f"Model: {data['model']}")
```

---

## MCP Server Integration

VideoGen provides an MCP (Model Context Protocol) server for programmatic access:

```bash
# Start the MCP server
python3 videogen_mcp_server.py
```

### MCP Tools Available

The MCP server exposes the following tools:

| Tool | Description |
|------|-------------|
| `videogen_list_models` | List available models (supports filter and batch parameters) |
| `videogen_show_model` | Show model details by ID or name |
| `videogen_generate_video` | Generate video from text (T2V) |
| `videogen_generate_image` | Generate image from text (T2I) |
| `videogen_animate_image` | Animate image (I2V) |
| `videogen_transform_image` | Transform image (I2I) |
| `videogen_generate_with_audio` | Generate video with audio |
| `videogen_transcribe_video` | Transcribe video audio |
| `videogen_create_subtitles` | Create subtitles |
| `videogen_dub_video` | Dub/translate video |
| `videogen_search_models` | Search HuggingFace for models |
| `videogen_add_model` | Add custom model |
| `videogen_update_models` | Update model database |

### MCP Parameters

New MCP parameters for the tools:

```python
# List models with batch output
{"filter": "t2v", "batch": true}

# Generate with output directory
{"model": "wan_14b_t2v", "prompt": "...", "output_dir": "/path/to/output", "yes": true}

# Transcribe with audio chunking
{"video": "input.mp4", "audio_chunk": "overlap"}  # or "word-boundary" or "vad"
```

---

## Best Practices for AI Agents

1. **Update models first** - Run `--update-models` before first use
2. **Start with auto mode** - It handles most cases well
3. **Check VRAM** - Use `--model-list --low-vram` for limited hardware
4. **Set seeds** - For reproducibility, always specify `--seed`
5. **Handle errors gracefully** - Check return codes and stderr
6. **Store command lines** - Auto mode prints reproducible commands
7. **Respect user preferences** - Quality vs speed, resolution, duration
8. **Use appropriate models** - Match model size to task complexity

---

## Troubleshooting

### Command Fails

```bash
# First: Update model database
python3 videogen --update-models

# Check if model exists
python3 videogen --model-list | grep <model_name>

# Check dependencies
pip install -r requirements.txt
```

### No Models Available

```bash
# This means the model database hasn't been initialized
python3 videogen --update-models

# Wait for it to complete (may take a few minutes)
# Then verify
python3 videogen --model-list
```

### Slow Generation

```bash
# Use smaller model
python3 videogen --model wan_1.3b_t2v --prompt "test"

# Reduce length
python3 videogen --model wan_14b_t2v --prompt "test" --length 3
```

### Quality Issues

```bash
# Use larger model
python3 videogen --model wan_14b_t2v --prompt "detailed scene"

# Increase steps (for image models)
python3 videogen --model flux_dev --prompt "test" --image-steps 50

# Use upscale
python3 videogen --model wan_14b_t2v --prompt "test" --upscale
```

---

## Support

For issues and questions:
- Email: stefy@nexlab.net
- Repository: git.nexlab.net:nexlab/videogen.git
- Documentation: See EXAMPLES.md for comprehensive examples