Add video dubbing, translation, and subtitle features

Features Added: - Video dubbing with voice preservation (--dub-video) - Automatic subtitle generation (--create-subtitles) - Subtitle translation (--translate-subtitles) - Burn subtitles into video (--burn-subtitles) - Audio transcription using Whisper (--transcribe) - Text translation using MarianMT models New Command-Line Arguments: - --transcribe: Transcribe audio from video - --whisper-model: Select Whisper model size (tiny/base/small/medium/large) - --source-lang: Source language code - --target-lang: Target language code for translation - --create-subtitles: Create SRT subtitles from video - --translate-subtitles: Translate subtitles to target language - --burn-subtitles: Burn subtitles into video - --subtitle-style: Customize subtitle appearance - --dub-video: Translate and dub video with voice preservation - --voice-clone/--no-voice-clone: Enable/disable voice cloning MCP Server Updates: - Added videogen_transcribe_video tool - Added videogen_create_subtitles tool - Added videogen_dub_video tool - Added videogen_translate_text tool Documentation Updates: - Updated SKILL.md with dubbing/translation section - Updated EXAMPLES.md with comprehensive examples - Updated requirements.txt with openai-whisper dependency Supported Languages: English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian

Add video dubbing, translation, and subtitle features
Features Added: - Video dubbing with voice preservation (--dub-video) - Automatic subtitle generation (--create-subtitles) - Subtitle translation (--translate-subtitles) - Burn subtitles into video (--burn-subtitles) - Audio transcription using Whisper (--transcribe) - Text translation using MarianMT models New Command-Line Arguments: - --transcribe: Transcribe audio from video - --whisper-model: Select Whisper model size (tiny/base/small/medium/large) - --source-lang: Source language code - --target-lang: Target language code for translation - --create-subtitles: Create SRT subtitles from video - --translate-subtitles: Translate subtitles to target language - --burn-subtitles: Burn subtitles into video - --subtitle-style: Customize subtitle appearance - --dub-video: Translate and dub video with voice preservation - --voice-clone/--no-voice-clone: Enable/disable voice cloning MCP Server Updates: - Added videogen_transcribe_video tool - Added videogen_create_subtitles tool - Added videogen_dub_video tool - Added videogen_translate_text tool Documentation Updates: - Updated SKILL.md with dubbing/translation section - Updated EXAMPLES.md with comprehensive examples - Updated requirements.txt with openai-whisper dependency Supported Languages: English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian
6505a00a · Stefy Lanza (nextime / spora ) · 1c01f5b7 · 6505a00a · 6505a00a · 6505a00a
Commit 6505a00a authored Feb 25, 2026 by Stefy Lanza (nextime / spora )
Showing with 880 additions and 7 deletions

EXAMPLES.md EXAMPLES.md +126 -7

SKILL.md SKILL.md +67 -0

requirements.txt requirements.txt +5 -0

videogen videogen +498 -0

videogen_mcp_server.py videogen_mcp_server.py +184 -0

No files found.
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@@ -17,13 +17,15 @@ This document contains comprehensive examples for using the VideoGen toolkit, co
 9. [2D-to-3D Conversion](#2d-to-3d-conversion)
 10. [Audio Generation](#audio-generation)
 11. [Lip Sync](#lip-sync)
-12. [Character Consistency](#character-consistency)
-13. [Distributed Multi-GPU](#distributed-multi-gpu)
-14. [Model Management](#model-management)
-15. [VRAM Management](#vram-management)
-16. [Upscaling](#upscaling)
-17. [NSFW Content](#nsfw-content)
-18. [Advanced Combinations](#advanced-combinations)
+12. [Video Dubbing & Translation](#video-dubbing--translation)
+13. [Subtitle Generation](#subtitle-generation)
+14. [Character Consistency](#character-consistency)
+15. [Distributed Multi-GPU](#distributed-multi-gpu)
+16. [Model Management](#model-management)
+17. [VRAM Management](#vram-management)
+18. [Upscaling](#upscaling)
+19. [NSFW Content](#nsfw-content)
+20. [Advanced Combinations](#advanced-combinations)

 ---

@@ -841,6 +843,123 @@ python3 videogen --image_to_video --model svd_xt_1.1 \

 ---

+## Video Dubbing & Translation
+
+Video dubbing allows you to translate videos while preserving the original voice characteristics using voice cloning technology.
+
+### Transcribe Video Audio
+
+```bash
+# Basic transcription
+python3 videogen --video interview.mp4 --transcribe
+
+# Use larger Whisper model for better accuracy
+python3 videogen --video interview.mp4 --transcribe --whisper-model large
+
+# Specify source language
+python3 videogen --video spanish_video.mp4 --transcribe --source-lang es
+```
+
+### Dub Video with Translation
+
+```bash
+# Translate and dub video (preserves voice with voice cloning)
+python3 videogen --video english_video.mp4 --dub-video --target-lang es --output spanish_version
+
+# Dub to French
+python3 videogen --video english_video.mp4 --dub-video --target-lang fr --output french_version
+
+# Dub to German without voice cloning (use standard TTS)
+python3 videogen --video english_video.mp4 --dub-video --target-lang de --no-voice-clone --output german_version
+
+# Dub with specific TTS voice
+python3 videogen --video english_video.mp4 --dub-video --target-lang ja --tts_voice edge_female_ja --output japanese_version
+
+# Dub with larger Whisper model for better transcription
+python3 videogen --video interview.mp4 --dub-video --target-lang zh --whisper-model medium --output chinese_version
+```
+
+### Multi-Language Dubbing Pipeline
+
+```bash
+# Create multiple language versions
+for lang in es fr de ja; do
+  python3 videogen --video original.mp4 --dub-video --target-lang $lang --output "version_$lang"
+done
+```
+
+---
+
+## Subtitle Generation
+
+Generate and translate subtitles automatically from video audio.
+
+### Create Subtitles
+
+```bash
+# Create SRT subtitles from video
+python3 videogen --video interview.mp4 --create-subtitles
+
+# Output: interview_en.srt (or detected language code)
+```
+
+### Create Translated Subtitles
+
+```bash
+# Create Spanish subtitles from English video
+python3 videogen --video english_video.mp4 --create-subtitles --translate-subtitles --target-lang es
+
+# Output: english_video_en.srt (original) and english_video_es.srt (translated)
+```
+
+### Burn Subtitles into Video
+
+```bash
+# Burn original subtitles into video
+python3 videogen --video interview.mp4 --create-subtitles --burn-subtitles --output subtitled
+
+# Burn translated subtitles
+python3 videogen --video english_video.mp4 --create-subtitles --translate-subtitles --target-lang es --burn-subtitles --output spanish_subtitled
+
+# Custom subtitle style
+python3 videogen --video interview.mp4 --create-subtitles --burn-subtitles \
+  --subtitle-style "font_size=32,font_color=yellow,outline_color=black" \
+  --output styled_subtitles
+```
+
+### Complete Translation Pipeline
+
+```bash
+# Full pipeline: transcribe, translate, create subtitles, and burn
+python3 videogen --video original.mp4 \
+  --create-subtitles \
+  --translate-subtitles \
+  --target-lang fr \
+  --burn-subtitles \
+  --output french_subtitled
+
+# Create both dubbed and subtitled versions
+python3 videogen --video original.mp4 --dub-video --target-lang es --output spanish_dubbed
+python3 videogen --video original.mp4 --create-subtitles --translate-subtitles --target-lang es --burn-subtitles --output spanish_subtitled
+```
+
+### Supported Languages
+
+| Code | Language | Code | Language |
+|------|----------|------|----------|
+| en | English | es | Spanish |
+| fr | French | de | German |
+| it | Italian | pt | Portuguese |
+| ru | Russian | zh | Chinese |
+| ja | Japanese | ko | Korean |
+| ar | Arabic | hi | Hindi |
+| nl | Dutch | pl | Polish |
+| tr | Turkish | vi | Vietnamese |
+| th | Thai | id | Indonesian |
+| sv | Swedish | uk | Ukrainian |
+
+---
+
 ## Character Consistency

 Character consistency features allow you to maintain the same character appearance across multiple generations using IP-Adapter, InstantID, Character Profiles, and LoRA training.

--- a/SKILL.md
+++ b/SKILL.md
@@ -19,6 +19,8 @@ VideoGen is a universal video generation toolkit that supports:
 - **Video Upscaling**: AI-powered video upscaling
 - **Audio Generation**: TTS and music generation
 - **Lip Sync**: Synchronize lip movements with audio
+- **Video Dubbing**: Translate and dub videos with voice preservation
+- **Subtitle Generation**: Create and translate subtitles automatically

 ---

@@ -276,6 +278,69 @@ python3 videogen --concat-videos video1.mp4 video2.mp4 video3.mp4 --output joine

 ---

+## Video Dubbing & Translation
+
+VideoGen supports video dubbing with voice preservation and automatic subtitle generation.
+
+### Transcribe Video Audio
+
+```bash
+# Transcribe audio from video
+python3 videogen --video input.mp4 --transcribe
+
+# Use larger Whisper model for better accuracy
+python3 videogen --video input.mp4 --transcribe --whisper-model large
+
+# Specify source language
+python3 videogen --video input.mp4 --transcribe --source-lang en
+```
+
+### Create Subtitles
+
+```bash
+# Create SRT subtitles from video
+python3 videogen --video input.mp4 --create-subtitles
+
+# Create translated subtitles
+python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang es
+
+# Burn subtitles into video
+python3 videogen --video input.mp4 --create-subtitles --burn-subtitles --output subtitled
+
+# Burn translated subtitles
+python3 videogen --video input.mp4 --create-subtitles --translate-subtitles --target-lang fr --burn-subtitles --output french_subtitled
+```
+
+### Dub Video with Translation
+
+```bash
+# Translate and dub video (preserves voice)
+python3 videogen --video input.mp4 --dub-video --target-lang es --output spanish_dubbed
+
+# Dub without voice cloning (use standard TTS)
+python3 videogen --video input.mp4 --dub-video --target-lang fr --no-voice-clone --output french_dubbed
+
+# Specify TTS voice for dubbing
+python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge_male_de --output german_dubbed
+```
+
+### Supported Languages
+
+| Code | Language | Code | Language |
+|------|----------|------|----------|
+| en | English | es | Spanish |
+| fr | French | de | German |
+| it | Italian | pt | Portuguese |
+| ru | Russian | zh | Chinese |
+| ja | Japanese | ko | Korean |
+| ar | Arabic | hi | Hindi |
+| nl | Dutch | pl | Polish |
+| tr | Turkish | vi | Vietnamese |
+| th | Thai | id | Indonesian |
+| sv | Swedish | uk | Ukrainian |
+
+---
+
 ## Output Files

 VideoGen creates these output files:
@@ -290,6 +355,8 @@ VideoGen creates these output files:
 | `<output>_synced.mp4` | Audio-synced video |
 | `<output>_lipsync.mp4` | Lip-synced video |
 | `<output>_upscaled.mp4` | Upscaled video |
+| `<output>_<lang>.srt` | Subtitle file |
+| `<output>_dubbed.mp4` | Dubbed video |

 ---


--- a/requirements.txt
+++ b/requirements.txt
@@ -26,6 +26,11 @@ edge-tts>=6.1.0
 # bark  # Install with: pip install git+https://github.com/suno-ai/bark.git
 # audiocraft  # Install with: pip install audiocraft

+# Speech-to-Text & Translation (Optional - for dubbing and subtitles)
+openai-whisper>=20231117  # For transcription
+# Or: pip install openai-whisper
+# Translation uses transformers (already listed above) with MarianMT models
+
 # Lip Sync Dependencies (Optional)
 opencv-python>=4.8.0
 face-recognition>=1.14.0

--- a/videogen
+++ b/videogen
--- a/videogen_mcp_server.py
+++ b/videogen_mcp_server.py
@@ -661,6 +661,134 @@ async def list_tools() -> list:
                "properties": {}
            }
        ),
+        
+        Tool(
+            name="videogen_transcribe_video",
+            description="Transcribe audio from a video using Whisper AI. Extracts spoken text with timestamps.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "video": {
+                        "type": "string",
+                        "description": "Path to the input video file"
+                    },
+                    "model_size": {
+                        "type": "string",
+                        "enum": ["tiny", "base", "small", "medium", "large"],
+                        "description": "Whisper model size (larger = more accurate, slower)",
+                        "default": "base"
+                    },
+                    "language": {
+                        "type": "string",
+                        "description": "Source language code (e.g., en, es, fr). Auto-detected if not specified."
+                    }
+                },
+                "required": ["video"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_create_subtitles",
+            description="Create SRT subtitles from video audio. Optionally translate to another language.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "video": {
+                        "type": "string",
+                        "description": "Path to the input video file"
+                    },
+                    "target_lang": {
+                        "type": "string",
+                        "description": "Target language code for translation (e.g., en, es, fr, de, zh, ja)"
+                    },
+                    "source_lang": {
+                        "type": "string",
+                        "description": "Source language code. Auto-detected if not specified."
+                    },
+                    "model_size": {
+                        "type": "string",
+                        "enum": ["tiny", "base", "small", "medium", "large"],
+                        "default": "base"
+                    },
+                    "burn": {
+                        "type": "boolean",
+                        "description": "Burn subtitles into video",
+                        "default": False
+                    },
+                    "output": {
+                        "type": "string",
+                        "description": "Output video path (if burning) or SRT file path",
+                        "default": "output"
+                    }
+                },
+                "required": ["video"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_dub_video",
+            description="Translate and dub a video with voice preservation. Replaces original audio with translated speech while maintaining the original voice characteristics.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "video": {
+                        "type": "string",
+                        "description": "Path to the input video file"
+                    },
+                    "target_lang": {
+                        "type": "string",
+                        "description": "Target language code (e.g., en, es, fr, de, zh, ja)"
+                    },
+                    "source_lang": {
+                        "type": "string",
+                        "description": "Source language code. Auto-detected if not specified."
+                    },
+                    "voice_clone": {
+                        "type": "boolean",
+                        "description": "Use voice cloning to preserve original voice",
+                        "default": True
+                    },
+                    "tts_voice": {
+                        "type": "string",
+                        "description": "TTS voice to use if not voice cloning (e.g., edge_female_us)"
+                    },
+                    "model_size": {
+                        "type": "string",
+                        "enum": ["tiny", "base", "small", "medium", "large"],
+                        "default": "base"
+                    },
+                    "output": {
+                        "type": "string",
+                        "description": "Output video path",
+                        "default": "output"
+                    }
+                },
+                "required": ["video", "target_lang"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_translate_text",
+            description="Translate text between languages using MarianMT models.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "text": {
+                        "type": "string",
+                        "description": "Text to translate"
+                    },
+                    "source_lang": {
+                        "type": "string",
+                        "description": "Source language code (e.g., en, es, fr)"
+                    },
+                    "target_lang": {
+                        "type": "string",
+                        "description": "Target language code (e.g., en, es, fr)"
+                    }
+                },
+                "required": ["text", "source_lang", "target_lang"]
+            }
+        ),
    ]


@@ -939,6 +1067,62 @@ async def call_tool(name: str, arguments: dict) -> list:
        output, code = run_videogen_command(args)
        return [TextContent(type="text", text=output)]
    
+    elif name == "videogen_transcribe_video":
+        args = [
+            "--video", arguments["video"],
+            "--transcribe",
+            "--whisper-model", arguments.get("model_size", "base"),
+        ]
+        if arguments.get("language"):
+            args.extend(["--source-lang", arguments["language"]])
+        output, code = run_videogen_command(args, timeout=1800)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_create_subtitles":
+        args = [
+            "--video", arguments["video"],
+            "--create-subtitles",
+            "--whisper-model", arguments.get("model_size", "base"),
+            "--output", arguments.get("output", "output"),
+        ]
+        if arguments.get("source_lang"):
+            args.extend(["--source-lang", arguments["source_lang"]])
+        if arguments.get("target_lang"):
+            args.extend(["--target-lang", arguments["target_lang"], "--translate-subtitles"])
+        if arguments.get("burn"):
+            args.append("--burn-subtitles")
+        output, code = run_videogen_command(args, timeout=1800)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_dub_video":
+        args = [
+            "--video", arguments["video"],
+            "--dub-video",
+            "--target-lang", arguments["target_lang"],
+            "--whisper-model", arguments.get("model_size", "base"),
+            "--output", arguments.get("output", "output"),
+        ]
+        if arguments.get("source_lang"):
+            args.extend(["--source-lang", arguments["source_lang"]])
+        if arguments.get("voice_clone", True):
+            args.append("--voice-clone")
+        else:
+            args.append("--no-voice-clone")
+        if arguments.get("tts_voice"):
+            args.extend(["--tts_voice", arguments["tts_voice"]])
+        output, code = run_videogen_command(args, timeout=3600)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_translate_text":
+        # This is a direct translation without video - we'll use the main script's translation
+        args = [
+            "--translate-text", arguments["text"],
+            "--source-lang", arguments["source_lang"],
+            "--target-lang", arguments["target_lang"],
+        ]
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
    else:
        return [TextContent(type="text", text=f"Unknown tool: {name}")]