# VideoGen AI Agent Skill

**Copyleft © 2026 Stefy <stefy@nexlab.net>**

This skill enables AI agents to use the VideoGen toolkit for video, image, and audio generation.

---

## Overview

VideoGen is a universal video generation toolkit that supports:
- **Text-to-Video (T2V)**: Generate videos from text prompts
- **Image-to-Video (I2V)**: Animate static images
- **Text-to-Image (T2I)**: Generate images from text
- **Image-to-Image (I2I)**: Transform existing images
- **Video-to-Video (V2V)**: Style transfer and filters for videos
- **Video-to-Image (V2I)**: Extract frames and keyframes from videos
- **2D-to-3D Conversion**: Convert 2D videos to 3D SBS, anaglyph, or VR 360
- **Video Upscaling**: AI-powered video upscaling
- **Audio Generation**: TTS and music generation
- **Lip Sync**: Synchronize lip movements with audio

---

## ⚠️ IMPORTANT: First-Time Setup

Before using VideoGen, you MUST update the model database:

```bash
python3 videogen --update-models
```

This command:
- Fetches the latest model list from HuggingFace
- Populates the local model database
- Enables model discovery and selection

**Run this command once before using any other VideoGen features.**

---

## Quick Reference for AI Agents

### Basic Commands

```bash
# ⚠️ FIRST: Update model database (required on first use)
python3 videogen --update-models

# Simple video generation
python3 videogen --model <model_name> --prompt "<prompt>" --output <output_name>

# Auto mode (recommended for AI agents)
python3 videogen --auto --prompt "<prompt>" --output <output_name>

# List available models
python3 videogen --model-list

# Show model details
python3 videogen --show-model <model_id_or_name>
```

### Generation Types

| Type | Command Pattern | Use Case |
|------|-----------------|----------|
| T2V | `--model t2v_model --prompt "..."` | Generate video from text |
| I2V | `--image_to_video --model i2v_model --prompt "..."` | Animate an image |
| T2I | `--model t2i_model --prompt "..." --output image.png` | Generate image |
| I2I | `--image-to-image --image input.png --prompt "..."` | Transform image |
| V2V | `--video input.mp4 --video-to-video --prompt "..."` | Style transfer on video |
| V2I | `--video input.mp4 --extract-keyframes` | Extract frames from video |
| 3D | `--video input.mp4 --convert-3d-sbs` | Convert 2D to 3D |

### Model Filters

```bash
# List models by type
python3 videogen --model-list --t2v-only      # Text-to-Video models
python3 videogen --model-list --i2v-only      # Image-to-Video models
python3 videogen --model-list --t2i-only      # Text-to-Image models
python3 videogen --model-list --v2v-only      # Video-to-Video models
python3 videogen --model-list --v2i-only      # Video-to-Image models
python3 videogen --model-list --3d-only       # 2D-to-3D models
python3 videogen --model-list --tts-only      # TTS models
python3 videogen --model-list --audio-only    # Audio models

# List by VRAM requirement
python3 videogen --model-list --low-vram      # ≤16GB VRAM
python3 videogen --model-list --high-vram    # >30GB VRAM
python3 videogen --model-list --huge-vram    # >55GB VRAM

# List NSFW-friendly models
python3 videogen --model-list --nsfw-friendly
```

### Auto Mode

Auto mode is the easiest way for AI agents to generate content:

```bash
# First, ensure models are updated
python3 videogen --update-models

# Then use auto mode
python3 videogen --auto --prompt "<description>"
```

Auto mode automatically:
1. Detects the generation type from the prompt
2. Detects NSFW content
3. Selects the best model for available VRAM
4. Configures all settings
5. Prints the command line for reproduction

---

## Common Use Cases

### 1. Generate a Simple Video

```bash
python3 videogen --auto --prompt "a cat playing piano" --output cat_piano
```

### 2. Generate a Video with Narration

```bash
python3 videogen --auto --prompt "a woman speaking to camera" \
  --generate_audio --audio_type tts \
  --audio_text "Welcome to my channel" \
  --output speaker
```

### 3. Animate an Existing Image

```bash
python3 videogen --model svd_xt_1.1 --image photo.jpg \
  --prompt "add subtle motion" --output animated
```

### 4. Generate an Image

```bash
python3 videogen --model flux_dev --prompt "beautiful landscape" \
  --output landscape.png
```

### 5. Transform an Image

```bash
python3 videogen --model flux_dev --image-to-image \
  --image photo.jpg --prompt "make it look like a painting" \
  --output painted.png
```

### 6. Generate Video with Music

```bash
python3 videogen --model wan_14b_t2v --prompt "epic battle scene" \
  --generate_audio --audio_type music \
  --audio_text "epic orchestral music" \
  --sync_audio --output battle
```

### 7. Generate Video with Lip Sync

```bash
python3 videogen --image_to_video --model svd_xt_1.1 \
  --image_model flux_dev --prompt "person speaking" \
  --generate_audio --audio_type tts \
  --audio_text "Hello world" \
  --lip_sync --output speaker
```

### 8. Video-to-Video Style Transfer

```bash
# Apply style transfer to a video
python3 videogen --video input.mp4 --video-to-video \
  --prompt "make it look like a watercolor painting" \
  --v2v-strength 0.7 --output styled
```

### 9. Apply Video Filter

```bash
# Apply grayscale filter
python3 videogen --video input.mp4 --video-filter grayscale --output gray

# Apply slow motion
python3 videogen --video input.mp4 --video-filter slow \
  --filter-params "factor=0.5" --output slowmo

# Apply blur
python3 videogen --video input.mp4 --video-filter blur \
  --filter-params "radius=10" --output blurred
```

### 10. Extract Frames from Video

```bash
# Extract keyframes
python3 videogen --video input.mp4 --extract-keyframes --frames-dir keyframes

# Extract single frame at timestamp
python3 videogen --video input.mp4 --extract-frame --timestamp 5.5 --output frame.png

# Extract all frames
python3 videogen --video input.mp4 --extract-frames --v2v-max-frames 100 --frames-dir all_frames
```

### 11. Create Video Collage

```bash
# Create 4x4 thumbnail grid
python3 videogen --video input.mp4 --video-collage \
  --collage-grid 4x4 --output collage.png
```

### 12. Upscale Video

```bash
# 2x upscale using ffmpeg
python3 videogen --video input.mp4 --upscale-video \
  --upscale-factor 2.0 --output upscaled

# 4x upscale using AI (ESRGAN)
python3 videogen --video input.mp4 --upscale-video \
  --upscale-factor 4.0 --upscale-method esrgan --output upscaled_4k
```

### 13. Convert 2D to 3D

```bash
# Convert to side-by-side 3D for VR
python3 videogen --video input.mp4 --convert-3d-sbs \
  --depth-method ai --output 3d_sbs

# Convert to anaglyph 3D (red/cyan glasses)
python3 videogen --video input.mp4 --convert-3d-anaglyph --output anaglyph

# Convert to VR 360 format
python3 videogen --video input.mp4 --convert-vr --output vr360
```

### 14. Concatenate Videos

```bash
# Join multiple videos
python3 videogen --concat-videos video1.mp4 video2.mp4 video3.mp4 --output joined
```

---

## Model Selection Guide

### By VRAM

| VRAM | Recommended Models |
|------|-------------------|
| <16GB | `wan_1.3b_t2v`, `zeroscope_v2_576w` |
| 16-30GB | `wan_14b_t2v`, `cogvideox_2b`, `mochi_1_preview` |
| 30-50GB | `allegro`, `hunyuanvideo` |
| 50GB+ | `open_sora_1_2`, `step_video_t2v` |

### By Task

| Task | Recommended Models |
|------|-------------------|
| Fast T2V | `wan_1.3b_t2v` |
| Quality T2V | `wan_14b_t2v`, `mochi_1_preview` |
| I2V | `svd_xt_1.1`, `wan_14b_i2v` |
| T2I | `flux_dev`, `sdxl_base` |
| Anime | `pony_v6` |

---

## Output Files

VideoGen creates these output files:

| File | Description |
|------|-------------|
| `<output>.mp4` | Generated video |
| `<output>.png` | Generated image (if T2I mode) |
| `<output>_init.png` | Initial image (I2V mode) |
| `<output>_tts.wav` | TTS audio |
| `<output>_music.wav` | Generated music |
| `<output>_synced.mp4` | Audio-synced video |
| `<output>_lipsync.mp4` | Lip-synced video |
| `<output>_upscaled.mp4` | Upscaled video |

---

## Error Handling

### Common Errors

1. **Model not found**
   - Run `python3 videogen --update-models` to update the database
   - Use `--model-list` to see available models

2. **Out of memory**
   - Use a smaller model
   - Add `--offload_strategy sequential`
   - Reduce resolution with `--width` and `--height`

3. **Pipeline class not found**
   - Update diffusers: `pip install --upgrade git+https://github.com/huggingface/diffusers.git`

---

## AI Agent Integration Tips

### 1. Always Use Auto Mode First

Auto mode handles most cases automatically:
```bash
python3 videogen --auto --prompt "<user's request>"
```

### 2. Parse the Output

Auto mode prints:
- Detection results (type, NSFW, motion)
- Selected model
- Full command line for reproduction

### 3. Handle User Preferences

If the user specifies:
- **Quality**: Use larger models like `wan_14b_t2v`, `mochi_1_preview`
- **Speed**: Use smaller models like `wan_1.3b_t2v`
- **Resolution**: Set `--width` and `--height`
- **Duration**: Set `--length`

### 4. NSFW Content

Auto mode detects NSFW content automatically. For explicit requests:
```bash
python3 videogen --model <model> --prompt "<prompt>" --no_filter
```

### 5. Reproducibility

Always capture and store the seed for reproducibility:
```bash
python3 videogen --model <model> --prompt "<prompt>" --seed 42
```

---

## Example Workflows

### Workflow 1: Content Creator

```bash
# Step 1: Generate intro video with narration
python3 videogen --auto --prompt "professional intro animation with logo" \
  --generate_audio --audio_type tts \
  --audio_text "Welcome to my channel" \
  --output intro

# Step 2: Generate main content
python3 videogen --auto --prompt "tutorial scene with presenter" \
  --length 30 --output main_content

# Step 3: Generate outro
python3 videogen --auto --prompt "end screen with subscribe button" \
  --output outro
```

### Workflow 2: Social Media

```bash
# Generate short vertical video
python3 videogen --model wan_14b_t2v \
  --prompt "trending dance move" \
  --width 720 --height 1280 \
  --length 10 --fps 30 \
  --output social_clip
```

### Workflow 3: Music Video

```bash
# Generate video with synchronized music
python3 videogen --model wan_14b_t2v \
  --prompt "cinematic music video scene" \
  --generate_audio --audio_type music \
  --audio_text "upbeat pop song with catchy melody" \
  --music_model large \
  --sync_audio \
  --output music_video
```

---

## Programmatic Usage

### Python Integration

```python
import subprocess
import json

def generate_video(prompt, output="output", model=None, auto=True):
    """Generate a video using VideoGen"""
    cmd = ["python3", "videogen"]
    
    if auto:
        cmd.append("--auto")
    elif model:
        cmd.extend(["--model", model])
    
    cmd.extend(["--prompt", prompt, "--output", output])
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout, result.returncode

def list_models(filter_type=None):
    """List available models"""
    cmd = ["python3", "videogen", "--model-list"]
    if filter_type == "i2v":
        cmd.append("--i2v-only")
    elif filter_type == "low_vram":
        cmd.append("--low-vram")
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

def get_model_info(model_id):
    """Get details for a specific model"""
    cmd = ["python3", "videogen", "--show-model", str(model_id)]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

def update_models():
    """Update model database from HuggingFace"""
    cmd = ["python3", "videogen", "--update-models"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout
```

### JSON Output (Future Feature)

VideoGen can be extended to output JSON for easier parsing:

```python
# Future: Add --json flag for machine-readable output
cmd = ["python3", "videogen", "--auto", "--prompt", prompt, "--json"]
result = subprocess.run(cmd, capture_output=True, text=True)
data = json.loads(result.stdout)
print(f"Video saved to: {data['output_file']}")
print(f"Seed: {data['seed']}")
print(f"Model: {data['model']}")
```

---

## Best Practices for AI Agents

1. **Update models first** - Run `--update-models` before first use
2. **Start with auto mode** - It handles most cases well
3. **Check VRAM** - Use `--model-list --low-vram` for limited hardware
4. **Set seeds** - For reproducibility, always specify `--seed`
5. **Handle errors gracefully** - Check return codes and stderr
6. **Store command lines** - Auto mode prints reproducible commands
7. **Respect user preferences** - Quality vs speed, resolution, duration
8. **Use appropriate models** - Match model size to task complexity

---

## Troubleshooting

### Command Fails

```bash
# First: Update model database
python3 videogen --update-models

# Check if model exists
python3 videogen --model-list | grep <model_name>

# Check dependencies
pip install -r requirements.txt
```

### No Models Available

```bash
# This means the model database hasn't been initialized
python3 videogen --update-models

# Wait for it to complete (may take a few minutes)
# Then verify
python3 videogen --model-list
```

### Slow Generation

```bash
# Use smaller model
python3 videogen --model wan_1.3b_t2v --prompt "test"

# Reduce length
python3 videogen --model wan_14b_t2v --prompt "test" --length 3
```

### Quality Issues

```bash
# Use larger model
python3 videogen --model wan_14b_t2v --prompt "detailed scene"

# Increase steps (for image models)
python3 videogen --model flux_dev --prompt "test" --image-steps 50

# Use upscale
python3 videogen --model wan_14b_t2v --prompt "test" --upscale
```

---

## Support

For issues and questions:
- Email: stefy@nexlab.net
- Repository: git.nexlab.net:nexlab/videogen.git
- Documentation: See EXAMPLES.md for comprehensive examples