Add audio generation, auto mode, MCP server, and comprehensive documentation

Features:
- Audio generation: TTS via Bark/Edge-TTS, music via MusicGen
- Audio sync: stretch, trim, pad, loop modes
- Lip sync: Wav2Lip and SadTalker integration
- Auto mode: automatic model selection with NSFW detection
- MCP server: AI agent integration via Model Context Protocol
- Model management: external config, search, validation
- T2I/I2I support: static image and image-to-image generation
- Time estimation: detailed timing breakdown for each step

Documentation:
- README.md: comprehensive installation and usage guide
- EXAMPLES.md: 100+ command-line examples
- SKILL.md: AI agent integration guide
- LICENSE.md: GPLv3 license

Copyleft © 2026 Stefy <stefy@nexlab.net>
parents
Pipeline #222 canceled with stages
This diff is collapsed.
This diff is collapsed.
# VideoGen - Universal Video Generation Toolkit
**Copyleft © 2026 Stefy <stefy@nexlab.net>**
A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Video (T2V), Image-to-Video (I2V), Text-to-Image (T2I), and Image-to-Image (I2I) generation with audio synthesis, synchronization, and lip-sync capabilities.
---
## Features
### Video Generation
- **Text-to-Video (T2V)**: Generate videos from text prompts
- **Image-to-Video (I2V)**: Animate static images
- **Text-to-Image (T2I)**: Generate high-quality images
- **Image-to-Image (I2I)**: Transform existing images
### Audio Capabilities
- **Text-to-Speech (TTS)**: Multiple voices via Bark and Edge-TTS
- **Music Generation**: MusicGen integration for background music
- **Audio Sync**: Match audio duration to video (stretch, trim, pad, loop)
- **Lip Sync**: Wav2Lip and SadTalker integration
### Model Support
- **Small Models** (<16GB VRAM): Wan 1.3B, Zeroscope, ModelScope
- **Medium Models** (16-30GB VRAM): Wan 14B, CogVideoX, Mochi
- **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo
- **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina
### Smart Features
- **Auto Mode**: Automatic model selection and configuration
- **NSFW Detection**: Automatic content classification
- **Prompt Splitting**: Intelligent I2V prompt separation
- **Time Estimation**: Predict generation time before starting
- **Multi-GPU**: Distributed generation across multiple GPUs
### AI Integration
- **MCP Server**: Model Context Protocol wrapper for AI agents
- **Skill Documentation**: Comprehensive AI agent integration guide
---
## Installation
### Core Dependencies
```bash
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --break-system-packages
pip install git+https://github.com/huggingface/diffusers.git --break-system-packages
pip install git+https://github.com/huggingface/transformers.git --break-system-packages
pip install --upgrade accelerate xformers spandrel psutil ffmpeg-python ftfy --break-system-packages
```
### Audio Features (Optional)
```bash
pip install scipy soundfile librosa --break-system-packages
pip install git+https://github.com/suno-ai/bark.git --break-system-packages
pip install edge-tts --break-system-packages
pip install audiocraft
```
### Lip Sync (Optional)
```bash
pip install opencv-python face-recognition dlib --break-system-packages
git clone https://github.com/Rudrabha/Wav2Lip.git
```
### MCP Server (For AI Agents)
```bash
pip install mcp
```
---
## Quick Start
### First-Time Setup
**IMPORTANT**: Before using VideoGen, update the model database:
```bash
python3 videogen --update-models
```
This fetches the latest model list from HuggingFace and populates the local database.
### Basic Usage
```bash
# Simple video generation
python3 videogen --model wan_1.3b_t2v --prompt "a cat playing piano" --output cat_piano
# Auto mode - let the script decide everything
python3 videogen --auto --prompt "a beautiful sunset over the ocean"
# Generate with audio
python3 videogen --model wan_14b_t2v --prompt "epic battle scene" \
--generate_audio --audio_type music --sync_audio --output battle
```
### Image-to-Video
```bash
# Animate an existing image
python3 videogen --model svd_xt_1.1 --image my_photo.jpg \
--prompt "add subtle motion" --output animated
# I2V with auto-generated image
python3 videogen --image_to_video --model svd_xt_1.1 \
--image_model flux_dev --prompt "cinematic portrait" \
--prompt_animation "gentle head movement" --output portrait
```
### With Lip Sync
```bash
python3 videogen --image_to_video --model svd_xt_1.1 \
--image_model flux_dev --prompt "person speaking" \
--generate_audio --audio_type tts \
--audio_text "Hello, welcome to my channel" \
--lip_sync --output speaker
```
---
## AI Agent Integration
### MCP Server
VideoGen includes an MCP (Model Context Protocol) server for seamless integration with AI agents like Claude:
```bash
# Start the MCP server
python3 videogen_mcp_server.py
```
Add to Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"videogen": {
"command": "python3",
"args": ["/path/to/videogen_mcp_server.py"]
}
}
}
```
### Available MCP Tools
| Tool | Description |
|------|-------------|
| `videogen_generate` | Generate video with auto mode |
| `videogen_generate_video` | Text-to-Video generation |
| `videogen_generate_image` | Text-to-Image generation |
| `videogen_animate_image` | Image-to-Video animation |
| `videogen_transform_image` | Image-to-Image transformation |
| `videogen_generate_with_audio` | Video with TTS or music |
| `videogen_list_models` | List available models |
| `videogen_show_model` | Show model details |
| `videogen_update_models` | Update model database |
| `videogen_search_models` | Search HuggingFace |
| `videogen_add_model` | Add model to database |
| `videogen_list_tts_voices` | List TTS voices |
### Skill Documentation
See [SKILL.md](SKILL.md) for comprehensive AI agent integration guide including:
- Quick reference commands
- Common use cases
- Model selection guide
- Error handling
- Programmatic usage examples
---
## Documentation
- **[EXAMPLES.md](EXAMPLES.md)**: Comprehensive command-line examples for all features
- **[SKILL.md](SKILL.md)**: AI agent integration guide
- **Built-in help**: `python3 videogen --help`
- **Model list**: `python3 videogen --model-list`
- **TTS voices**: `python3 videogen --tts-list`
---
## Model Management
```bash
# Update model database (run this first!)
python3 videogen --update-models
# List available models
python3 videogen --model-list
# List models by VRAM requirement
python3 videogen --model-list --low-vram # ≤16GB
python3 videogen --model-list --high-vram # >30GB
python3 videogen --model-list --huge-vram # >55GB
# Search HuggingFace for models
python3 videogen --search-models "video generation"
# Add a model
python3 videogen --add-model stabilityai/stable-video-diffusion-img2vid-xt-1.1 --name svd_xt
# Show model details
python3 videogen --show-model 1
```
---
## VRAM Management
```bash
# Limit VRAM usage
python3 videogen --model wan_14b_t2v --prompt "test" --vram_limit 16
# Offloading strategies
python3 videogen --model wan_14b_t2v --prompt "test" --offload_strategy sequential
# Low RAM mode
python3 videogen --model wan_14b_t2v --prompt "test" --low_ram_mode
```
---
## Distributed Generation
```bash
# Multi-GPU distributed generation
python3 videogen --model hunyuanvideo --prompt "epic scene" \
--length 30 --distribute --vram_limit 20
```
---
## Configuration
Models are stored in `~/.config/videogen/models.json`
Set environment variables:
```bash
export HF_TOKEN=your_token_here # For gated models
export HF_HOME=/path/to/cache # Custom cache directory
export CUDA_VISIBLE_DEVICES=0,1 # GPU selection
```
---
## Project Structure
```
videogen/
├── videogen # Main script
├── videogen_mcp_server.py # MCP server for AI agents
├── README.md # This file
├── EXAMPLES.md # Comprehensive examples
├── SKILL.md # AI agent integration guide
├── LICENSE.md # GPLv3 License
└── requirements.txt # Python dependencies
```
---
## License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
See [LICENSE.md](LICENSE.md) for the full license text.
---
## Copyleft
**VideoGen - Universal Video Generation Toolkit**
Copyright © 2026 Stefy <stefy@nexlab.net>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
---
## Contributing
Contributions are welcome! Please feel free to submit pull requests.
---
## Support
For issues and questions, please open an issue on the repository or contact stefy@nexlab.net.
\ No newline at end of file
This diff is collapsed.
# VideoGen - Universal Video Generation Toolkit
# Copyleft © 2026 Stefy <stefy@nexlab.net>
# Core Dependencies (Required)
torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
diffusers>=0.30.0
transformers>=4.35.0
accelerate>=0.24.0
xformers>=0.0.22
spandrel>=0.1.0
psutil>=5.9.0
ffmpeg-python>=0.2.0
ftfy>=6.1.0
Pillow>=10.0.0
safetensors>=0.4.0
huggingface-hub>=0.19.0
# Audio Dependencies (Optional - for TTS and music generation)
scipy>=1.11.0
soundfile>=0.12.0
librosa>=0.10.0
edge-tts>=6.1.0
# bark # Install with: pip install git+https://github.com/suno-ai/bark.git
# audiocraft # Install with: pip install audiocraft
# Lip Sync Dependencies (Optional)
opencv-python>=4.8.0
face-recognition>=1.14.0
# dlib # Install with: pip install dlib (requires cmake)
# Model Management
requests>=2.31.0
urllib3>=2.0.0
# Progress and UI
tqdm>=4.66.0
rich>=13.0.0
# Configuration
pydantic>=2.0.0
# Distributed Processing
# accelerate # Already listed above
# Optional: NSFW Classification
# onnxruntime>=1.16.0
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment