Add video dubbing, translation, and subtitle features

Features Added:
- Video dubbing with voice preservation (--dub-video)
- Automatic subtitle generation (--create-subtitles)
- Subtitle translation (--translate-subtitles)
- Burn subtitles into video (--burn-subtitles)
- Audio transcription using Whisper (--transcribe)
- Text translation using MarianMT models

New Command-Line Arguments:
- --transcribe: Transcribe audio from video
- --whisper-model: Select Whisper model size (tiny/base/small/medium/large)
- --source-lang: Source language code
- --target-lang: Target language code for translation
- --create-subtitles: Create SRT subtitles from video
- --translate-subtitles: Translate subtitles to target language
- --burn-subtitles: Burn subtitles into video
- --subtitle-style: Customize subtitle appearance
- --dub-video: Translate and dub video with voice preservation
- --voice-clone/--no-voice-clone: Enable/disable voice cloning

MCP Server Updates:
- Added videogen_transcribe_video tool
- Added videogen_create_subtitles tool
- Added videogen_dub_video tool
- Added videogen_translate_text tool

Documentation Updates:
- Updated SKILL.md with dubbing/translation section
- Updated EXAMPLES.md with comprehensive examples
- Updated requirements.txt with openai-whisper dependency

Supported Languages:
English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian
parent 1c01f5b7
......@@ -17,13 +17,15 @@ This document contains comprehensive examples for using the VideoGen toolkit, co
9. [2D-to-3D Conversion](#2d-to-3d-conversion)
10. [Audio Generation](#audio-generation)
11. [Lip Sync](#lip-sync)
12. [Character Consistency](#character-consistency)
13. [Distributed Multi-GPU](#distributed-multi-gpu)
14. [Model Management](#model-management)
15. [VRAM Management](#vram-management)
16. [Upscaling](#upscaling)
17. [NSFW Content](#nsfw-content)
18. [Advanced Combinations](#advanced-combinations)
12. [Video Dubbing & Translation](#video-dubbing--translation)
13. [Subtitle Generation](#subtitle-generation)
14. [Character Consistency](#character-consistency)
15. [Distributed Multi-GPU](#distributed-multi-gpu)
16. [Model Management](#model-management)
17. [VRAM Management](#vram-management)
18. [Upscaling](#upscaling)
19. [NSFW Content](#nsfw-content)
20. [Advanced Combinations](#advanced-combinations)
---
......@@ -841,6 +843,123 @@ python3 videogen --image_to_video --model svd_xt_1.1 \
---
## Video Dubbing & Translation
Video dubbing allows you to translate videos while preserving the original voice characteristics using voice cloning technology.
### Transcribe Video Audio
```bash
# Basic transcription
python3 videogen --video interview.mp4 --transcribe
# Use larger Whisper model for better accuracy
python3 videogen --video interview.mp4 --transcribe --whisper-model large
# Specify source language
python3 videogen --video spanish_video.mp4 --transcribe --source-lang es
```
### Dub Video with Translation
```bash
# Translate and dub video (preserves voice with voice cloning)
python3 videogen --video english_video.mp4 --dub-video --target-lang es --output spanish_version
# Dub to French
python3 videogen --video english_video.mp4 --dub-video --target-lang fr --output french_version
# Dub to German without voice cloning (use standard TTS)
python3 videogen --video english_video.mp4 --dub-video --target-lang de --no-voice-clone --output german_version
# Dub with specific TTS voice
python3 videogen --video english_video.mp4 --dub-video --target-lang ja --tts_voice edge_female_ja --output japanese_version
# Dub with larger Whisper model for better transcription
python3 videogen --video interview.mp4 --dub-video --target-lang zh --whisper-model medium --output chinese_version
```
### Multi-Language Dubbing Pipeline
```bash
# Create multiple language versions
for lang in es fr de ja; do
python3 videogen --video original.mp4 --dub-video --target-lang $lang --output "version_$lang"
done
```
---
## Subtitle Generation
Generate and translate subtitles automatically from video audio.
### Create Subtitles
```bash
# Create SRT subtitles from video
python3 videogen --video interview.mp4 --create-subtitles
# Output: interview_en.srt (or detected language code)
```
### Create Translated Subtitles
```bash
# Create Spanish subtitles from English video
python3 videogen --video english_video.mp4 --create-subtitles --translate-subtitles --target-lang es
# Output: english_video_en.srt (original) and english_video_es.srt (translated)
```
### Burn Subtitles into Video
```bash
# Burn original subtitles into video
python3 videogen --video interview.mp4 --create-subtitles --burn-subtitles --output subtitled
# Burn translated subtitles
python3 videogen --video english_video.mp4 --create-subtitles --translate-subtitles --target-lang es --burn-subtitles --output spanish_subtitled
# Custom subtitle style
python3 videogen --video interview.mp4 --create-subtitles --burn-subtitles \
--subtitle-style "font_size=32,font_color=yellow,outline_color=black" \
--output styled_subtitles
```
### Complete Translation Pipeline
```bash
# Full pipeline: transcribe, translate, create subtitles, and burn
python3 videogen --video original.mp4 \
--create-subtitles \
--translate-subtitles \
--target-lang fr \
--burn-subtitles \
--output french_subtitled
# Create both dubbed and subtitled versions
python3 videogen --video original.mp4 --dub-video --target-lang es --output spanish_dubbed
python3 videogen --video original.mp4 --create-subtitles --translate-subtitles --target-lang es --burn-subtitles --output spanish_subtitled
```
### Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| en | English | es | Spanish |
| fr | French | de | German |
| it | Italian | pt | Portuguese |
| ru | Russian | zh | Chinese |
| ja | Japanese | ko | Korean |
| ar | Arabic | hi | Hindi |
| nl | Dutch | pl | Polish |
| tr | Turkish | vi | Vietnamese |
| th | Thai | id | Indonesian |
| sv | Swedish | uk | Ukrainian |
---
## Character Consistency
Character consistency features allow you to maintain the same character appearance across multiple generations using IP-Adapter, InstantID, Character Profiles, and LoRA training.
......
......@@ -19,6 +19,8 @@ VideoGen is a universal video generation toolkit that supports:
- **Video Upscaling**: AI-powered video upscaling
- **Audio Generation**: TTS and music generation
- **Lip Sync**: Synchronize lip movements with audio
- **Video Dubbing**: Translate and dub videos with voice preservation
- **Subtitle Generation**: Create and translate subtitles automatically
---
......@@ -276,6 +278,69 @@ python3 videogen --concat-videos video1.mp4 video2.mp4 video3.mp4 --output joine
---
## Video Dubbing & Translation
VideoGen supports video dubbing with voice preservation and automatic subtitle generation.
### Transcribe Video Audio
```bash
# Transcribe audio from video
python3 videogen --video input.mp4 --transcribe
# Use larger Whisper model for better accuracy
python3 videogen --video input.mp4 --transcribe --whisper-model large
# Specify source language
python3 videogen --video input.mp4 --transcribe --source-lang en
```
### Create Subtitles
```bash
# Create SRT subtitles from video
python3 videogen --video input.mp4 --create-subtitles
# Create translated subtitles
python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang es
# Burn subtitles into video
python3 videogen --video input.mp4 --create-subtitles --burn-subtitles --output subtitled
# Burn translated subtitles
python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang fr --burn-subtitles --output french_subtitled
```
### Dub Video with Translation
```bash
# Translate and dub video (preserves voice)
python3 videogen --video input.mp4 --dub-video --target-lang es --output spanish_dubbed
# Dub without voice cloning (use standard TTS)
python3 videogen --video input.mp4 --dub-video --target-lang fr --no-voice-clone --output french_dubbed
# Specify TTS voice for dubbing
python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge_male_de --output german_dubbed
```
### Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| en | English | es | Spanish |
| fr | French | de | German |
| it | Italian | pt | Portuguese |
| ru | Russian | zh | Chinese |
| ja | Japanese | ko | Korean |
| ar | Arabic | hi | Hindi |
| nl | Dutch | pl | Polish |
| tr | Turkish | vi | Vietnamese |
| th | Thai | id | Indonesian |
| sv | Swedish | uk | Ukrainian |
---
## Output Files
VideoGen creates these output files:
......@@ -290,6 +355,8 @@ VideoGen creates these output files:
| `<output>_synced.mp4` | Audio-synced video |
| `<output>_lipsync.mp4` | Lip-synced video |
| `<output>_upscaled.mp4` | Upscaled video |
| `<output>_<lang>.srt` | Subtitle file |
| `<output>_dubbed.mp4` | Dubbed video |
---
......
......@@ -26,6 +26,11 @@ edge-tts>=6.1.0
# bark # Install with: pip install git+https://github.com/suno-ai/bark.git
# audiocraft # Install with: pip install audiocraft
# Speech-to-Text & Translation (Optional - for dubbing and subtitles)
openai-whisper>=20231117 # For transcription
# Or: pip install openai-whisper
# Translation uses transformers (already listed above) with MarianMT models
# Lip Sync Dependencies (Optional)
opencv-python>=4.8.0
face-recognition>=1.14.0
......
This diff is collapsed.
......@@ -661,6 +661,134 @@ async def list_tools() -> list:
"properties": {}
}
),
Tool(
name="videogen_transcribe_video",
description="Transcribe audio from a video using Whisper AI. Extracts spoken text with timestamps.",
inputSchema={
"type": "object",
"properties": {
"video": {
"type": "string",
"description": "Path to the input video file"
},
"model_size": {
"type": "string",
"enum": ["tiny", "base", "small", "medium", "large"],
"description": "Whisper model size (larger = more accurate, slower)",
"default": "base"
},
"language": {
"type": "string",
"description": "Source language code (e.g., en, es, fr). Auto-detected if not specified."
}
},
"required": ["video"]
}
),
Tool(
name="videogen_create_subtitles",
description="Create SRT subtitles from video audio. Optionally translate to another language.",
inputSchema={
"type": "object",
"properties": {
"video": {
"type": "string",
"description": "Path to the input video file"
},
"target_lang": {
"type": "string",
"description": "Target language code for translation (e.g., en, es, fr, de, zh, ja)"
},
"source_lang": {
"type": "string",
"description": "Source language code. Auto-detected if not specified."
},
"model_size": {
"type": "string",
"enum": ["tiny", "base", "small", "medium", "large"],
"default": "base"
},
"burn": {
"type": "boolean",
"description": "Burn subtitles into video",
"default": False
},
"output": {
"type": "string",
"description": "Output video path (if burning) or SRT file path",
"default": "output"
}
},
"required": ["video"]
}
),
Tool(
name="videogen_dub_video",
description="Translate and dub a video with voice preservation. Replaces original audio with translated speech while maintaining the original voice characteristics.",
inputSchema={
"type": "object",
"properties": {
"video": {
"type": "string",
"description": "Path to the input video file"
},
"target_lang": {
"type": "string",
"description": "Target language code (e.g., en, es, fr, de, zh, ja)"
},
"source_lang": {
"type": "string",
"description": "Source language code. Auto-detected if not specified."
},
"voice_clone": {
"type": "boolean",
"description": "Use voice cloning to preserve original voice",
"default": True
},
"tts_voice": {
"type": "string",
"description": "TTS voice to use if not voice cloning (e.g., edge_female_us)"
},
"model_size": {
"type": "string",
"enum": ["tiny", "base", "small", "medium", "large"],
"default": "base"
},
"output": {
"type": "string",
"description": "Output video path",
"default": "output"
}
},
"required": ["video", "target_lang"]
}
),
Tool(
name="videogen_translate_text",
description="Translate text between languages using MarianMT models.",
inputSchema={
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Text to translate"
},
"source_lang": {
"type": "string",
"description": "Source language code (e.g., en, es, fr)"
},
"target_lang": {
"type": "string",
"description": "Target language code (e.g., en, es, fr)"
}
},
"required": ["text", "source_lang", "target_lang"]
}
),
]
......@@ -939,6 +1067,62 @@ async def call_tool(name: str, arguments: dict) -> list:
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_transcribe_video":
args = [
"--video", arguments["video"],
"--transcribe",
"--whisper-model", arguments.get("model_size", "base"),
]
if arguments.get("language"):
args.extend(["--source-lang", arguments["language"]])
output, code = run_videogen_command(args, timeout=1800)
return [TextContent(type="text", text=output)]
elif name == "videogen_create_subtitles":
args = [
"--video", arguments["video"],
"--create-subtitles",
"--whisper-model", arguments.get("model_size", "base"),
"--output", arguments.get("output", "output"),
]
if arguments.get("source_lang"):
args.extend(["--source-lang", arguments["source_lang"]])
if arguments.get("target_lang"):
args.extend(["--target-lang", arguments["target_lang"], "--translate-subtitles"])
if arguments.get("burn"):
args.append("--burn-subtitles")
output, code = run_videogen_command(args, timeout=1800)
return [TextContent(type="text", text=output)]
elif name == "videogen_dub_video":
args = [
"--video", arguments["video"],
"--dub-video",
"--target-lang", arguments["target_lang"],
"--whisper-model", arguments.get("model_size", "base"),
"--output", arguments.get("output", "output"),
]
if arguments.get("source_lang"):
args.extend(["--source-lang", arguments["source_lang"]])
if arguments.get("voice_clone", True):
args.append("--voice-clone")
else:
args.append("--no-voice-clone")
if arguments.get("tts_voice"):
args.extend(["--tts_voice", arguments["tts_voice"]])
output, code = run_videogen_command(args, timeout=3600)
return [TextContent(type="text", text=output)]
elif name == "videogen_translate_text":
# This is a direct translation without video - we'll use the main script's translation
args = [
"--translate-text", arguments["text"],
"--source-lang", arguments["source_lang"],
"--target-lang", arguments["target_lang"],
]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
else:
return [TextContent(type="text", text=f"Unknown tool: {name}")]
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment