Add character consistency features, fix model loading for non-diffusers models

- Add character profile management (create, list, show, delete)
- Add IP-Adapter and InstantID support for character consistency
- Fix model loading for models with config.json only (no model_index.json)
- Add component-only model detection (fine-tuned weights)
- Update MCP server with character consistency tools
- Update SKILL.md and README.md documentation
- Add memory management for dubbing/translation
- Add chunked processing for Whisper transcription
- Add character persistency options to web interface
parent 627eb38f
...@@ -47,6 +47,12 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid ...@@ -47,6 +47,12 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
- **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo - **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo
- **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina - **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina
### Character Consistency
- **Character Profiles**: Save and reuse character references across generations
- **IP-Adapter**: Image prompt adapter for consistent character generation
- **InstantID**: Face identity preservation for consistent faces
- **Reference Images**: Use multiple reference images for character consistency
### Smart Features ### Smart Features
- **Auto Mode**: Automatic model selection and configuration - **Auto Mode**: Automatic model selection and configuration
- **NSFW Detection**: Automatic content classification - **NSFW Detection**: Automatic content classification
...@@ -54,6 +60,7 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid ...@@ -54,6 +60,7 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
- **Time Estimation**: Hardware-aware generation time prediction - **Time Estimation**: Hardware-aware generation time prediction
- **Multi-GPU**: Distributed generation across multiple GPUs - **Multi-GPU**: Distributed generation across multiple GPUs
- **Auto-Disable**: Models that fail 3 times are auto-disabled - **Auto-Disable**: Models that fail 3 times are auto-disabled
- **Memory Management**: Automatic chunking for long videos and low VRAM
### User Interfaces ### User Interfaces
- **Command Line**: Full-featured CLI with all options - **Command Line**: Full-featured CLI with all options
...@@ -147,6 +154,40 @@ python3 videogen --image_to_video --model svd_xt_1.1 \ ...@@ -147,6 +154,40 @@ python3 videogen --image_to_video --model svd_xt_1.1 \
--lip_sync --output speaker --lip_sync --output speaker
``` ```
### Character Consistency
VideoGen supports character consistency across multiple generations using IP-Adapter and InstantID.
```bash
# Create a character profile from reference images
python3 videogen --create-character my_character \
--character-images ref1.jpg ref2.jpg ref3.jpg \
--character-desc "A young woman with red hair"
# List saved character profiles
python3 videogen --list-characters
# Generate with character consistency
python3 videogen --model flux_dev \
--character my_character \
--prompt "my_character walking in a park" \
--output character_park
# Use IP-Adapter directly with reference images
python3 videogen --model sdxl_base \
--ipadapter --ipadapter-scale 0.8 \
--reference-images ref1.jpg ref2.jpg \
--prompt "a person reading a book" \
--output reading
# Use InstantID for face consistency
python3 videogen --model sdxl_base \
--ipadapter --instantid \
--reference-images face_ref.jpg \
--prompt "portrait of a person smiling" \
--output portrait
```
--- ---
## AI Agent Integration ## AI Agent Integration
......
...@@ -341,6 +341,74 @@ python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge ...@@ -341,6 +341,74 @@ python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge
--- ---
## Character Consistency
VideoGen supports character consistency across multiple generations using IP-Adapter, InstantID, and Character Profiles.
### Create Character Profile
```bash
# Create a character profile from reference images
python3 videogen --create-character alice \
--character-images ref1.jpg ref2.jpg ref3.jpg \
--character-desc "young woman with blue eyes and blonde hair"
# List all saved character profiles
python3 videogen --list-characters
# Show details of a character profile
python3 videogen --show-character alice
# Delete a character profile
python3 videogen --delete-character alice
```
### Generate with Character
```bash
# Generate image with character consistency
python3 videogen --model flux_dev \
--character alice \
--prompt "alice walking in a park" \
--output alice_park.png
# Generate video with character (I2V)
python3 videogen --image_to_video --model svd_xt_1.1 \
--image_model flux_dev \
--character alice \
--prompt "alice smiling at camera" \
--prompt_animation "subtle head movement" \
--output alice_animated
```
### IP-Adapter Direct Usage
```bash
# Use IP-Adapter with reference images directly
python3 videogen --model flux_dev \
--ipadapter --ipadapter-scale 0.8 \
--reference-images ref1.jpg ref2.jpg \
--prompt "the person in a business suit" \
--output business.png
# Use InstantID for face identity
python3 videogen --model flux_dev \
--ipadapter --instantid \
--reference-images face_ref.jpg \
--prompt "portrait of the person smiling" \
--output portrait.png
```
### Character Consistency Tips
1. **Use multiple reference images** (3-5) for better consistency
2. **IP-Adapter scale**: 0.7-0.9 for good balance (higher = more similar)
3. **InstantID** is better for face identity preservation
4. **Character profiles** are reusable across sessions
5. **Combine IP-Adapter + InstantID** for best results
---
## Output Files ## Output Files
VideoGen creates these output files: VideoGen creates these output files:
......
...@@ -933,4 +933,139 @@ body { ...@@ -933,4 +933,139 @@ body {
.fa-spin { .fa-spin {
animation: spin 1s linear infinite; animation: spin 1s linear infinite;
} }
\ No newline at end of file /* Character Consistency Styles */
/* File Upload Box */
.file-upload {
position: relative;
border: 2px dashed var(--border-color);
border-radius: var(--border-radius);
padding: 2rem;
text-align: center;
transition: all 0.3s;
cursor: pointer;
}
.file-upload:hover {
border-color: var(--primary);
background: rgba(99, 102, 241, 0.05);
}
.file-upload input[type="file"] {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
opacity: 0;
cursor: pointer;
}
.file-upload .file-label {
display: flex;
flex-direction: column;
align-items: center;
gap: 0.5rem;
pointer-events: none;
}
.file-upload .file-label i {
font-size: 2rem;
color: var(--primary);
}
.file-upload .file-label span {
font-weight: 500;
}
.file-upload .file-label small {
color: var(--text-muted);
font-size: 0.8rem;
}
/* Image Preview Grid */
.image-preview-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(100px, 1fr));
gap: 0.75rem;
margin-top: 1rem;
}
.preview-item {
position: relative;
aspect-ratio: 1;
border-radius: var(--border-radius);
overflow: hidden;
background: var(--bg-darker);
}
.preview-item img {
width: 100%;
height: 100%;
object-fit: cover;
}
.preview-item .remove-btn {
position: absolute;
top: 4px;
right: 4px;
width: 24px;
height: 24px;
border-radius: 50%;
background: rgba(0, 0, 0, 0.7);
color: white;
border: none;
cursor: pointer;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.7rem;
opacity: 0;
transition: opacity 0.2s;
}
.preview-item:hover .remove-btn {
opacity: 1;
}
.preview-item .remove-btn:hover {
background: var(--danger);
}
/* Character Section */
#character-section {
border-left: 3px solid var(--primary);
}
#character-section h3 {
color: var(--primary-light);
}
/* IP-Adapter and InstantID Options */
#ipadapter-options,
#instantid-options,
#character-profile-options {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid var(--border-color);
}
/* Range Slider with Value Display */
.form-group input[type="range"] {
width: calc(100% - 50px);
vertical-align: middle;
}
.form-group input[type="range"] + span {
display: inline-block;
width: 40px;
text-align: right;
font-weight: 600;
color: var(--primary);
}
/* Hidden class */
.hidden {
display: none !important;
}
This diff is collapsed.
...@@ -327,6 +327,85 @@ ...@@ -327,6 +327,85 @@
</div> </div>
</div> </div>
<!-- Character Consistency -->
<div class="form-section" id="character-section">
<h3><i class="fas fa-user-circle"></i> Character Consistency</h3>
<div class="form-row">
<label class="checkbox-label">
<input type="checkbox" id="use_character" name="use_character" onchange="toggleCharacterOptions()">
<span>Use Character Profile</span>
</label>
<label class="checkbox-label">
<input type="checkbox" id="use_ipadapter" name="use_ipadapter" onchange="toggleIPAdapterOptions()">
<span>IP-Adapter</span>
</label>
<label class="checkbox-label">
<input type="checkbox" id="use_instantid" name="use_instantid">
<span>InstantID (Face)</span>
</label>
</div>
<!-- Character Profile Selection -->
<div id="character-profile-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="character_profile">Character Profile</label>
<select id="character_profile" name="character_profile">
<option value="">Select a character...</option>
</select>
</div>
<div class="form-group">
<button type="button" class="btn btn-secondary" onclick="showCreateCharacterModal()">
<i class="fas fa-plus"></i> New Character
</button>
</div>
</div>
</div>
<!-- IP-Adapter Options -->
<div id="ipadapter-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="ipadapter_scale">IP-Adapter Scale</label>
<input type="range" id="ipadapter_scale" name="ipadapter_scale" value="0.8" min="0.0" max="1.0" step="0.1">
<span id="ipadapter-scale-value">0.8</span>
</div>
<div class="form-group">
<label for="ipadapter_type">IP-Adapter Type</label>
<select id="ipadapter_type" name="ipadapter_type">
<option value="plus_sd15">Plus (SD 1.5)</option>
<option value="plus_sdxl">Plus (SDXL)</option>
<option value="faceid_sd15">FaceID (SD 1.5)</option>
<option value="faceid_sdxl">FaceID (SDXL)</option>
</select>
</div>
</div>
<div class="form-group">
<label>Reference Images for IP-Adapter</label>
<div class="file-upload" id="reference-upload-box">
<input type="file" id="reference_images" name="reference_images" accept="image/*" multiple onchange="handleReferenceUpload(this)">
<label for="reference_images" class="file-label">
<i class="fas fa-images"></i>
<span>Upload Reference Images</span>
<small>Select 1-5 images</small>
</label>
</div>
<div id="reference-preview" class="image-preview-grid"></div>
</div>
</div>
<!-- InstantID Options -->
<div id="instantid-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="instantid_scale">InstantID Scale</label>
<input type="range" id="instantid_scale" name="instantid_scale" value="0.8" min="0.0" max="1.0" step="0.1">
<span id="instantid-scale-value">0.8</span>
</div>
</div>
</div>
</div>
<!-- Advanced Options --> <!-- Advanced Options -->
<div class="form-section collapsible"> <div class="form-section collapsible">
<h3 onclick="toggleSection(this)"> <h3 onclick="toggleSection(this)">
......
This diff is collapsed.
...@@ -789,6 +789,160 @@ async def list_tools() -> list: ...@@ -789,6 +789,160 @@ async def list_tools() -> list:
"required": ["text", "source_lang", "target_lang"] "required": ["text", "source_lang", "target_lang"]
} }
), ),
# Character Consistency Tools
Tool(
name="videogen_create_character",
description="Create a character profile from reference images for consistent character generation across multiple images/videos.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character name (alphanumeric, underscores, hyphens only)"
},
"reference_images": {
"type": "array",
"items": {"type": "string"},
"description": "List of paths to reference images (1-5 images)"
},
"description": {
"type": "string",
"description": "Optional description of the character"
}
},
"required": ["name", "reference_images"]
}
),
Tool(
name="videogen_list_characters",
description="List all saved character profiles.",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="videogen_show_character",
description="Show details of a specific character profile.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character profile name"
}
},
"required": ["name"]
}
),
Tool(
name="videogen_delete_character",
description="Delete a character profile.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character profile name to delete"
}
},
"required": ["name"]
}
),
Tool(
name="videogen_generate_with_character",
description="Generate an image or video with a specific character using IP-Adapter and/or InstantID for consistency.",
inputSchema={
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "Description of what to generate with the character"
},
"character": {
"type": "string",
"description": "Character profile name to use"
},
"model": {
"type": "string",
"description": "Model name (e.g., flux_dev, sdxl_base)"
},
"output": {
"type": "string",
"default": "output"
},
"use_ipadapter": {
"type": "boolean",
"description": "Use IP-Adapter for character consistency",
"default": True
},
"use_instantid": {
"type": "boolean",
"description": "Use InstantID for face identity preservation",
"default": False
},
"ipadapter_scale": {
"type": "number",
"description": "IP-Adapter influence scale (0.0-1.0)",
"default": 0.8
},
"instantid_scale": {
"type": "number",
"description": "InstantID influence scale (0.0-1.0)",
"default": 0.8
},
"animate": {
"type": "boolean",
"description": "Generate video instead of image (I2V)",
"default": False
}
},
"required": ["prompt", "character", "model"]
}
),
Tool(
name="videogen_generate_with_reference",
description="Generate an image using reference images directly (without creating a character profile).",
inputSchema={
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "Description of what to generate"
},
"reference_images": {
"type": "array",
"items": {"type": "string"},
"description": "List of paths to reference images"
},
"model": {
"type": "string",
"description": "Model name (e.g., flux_dev, sdxl_base)"
},
"output": {
"type": "string",
"default": "output"
},
"ipadapter_scale": {
"type": "number",
"description": "IP-Adapter influence scale (0.0-1.0)",
"default": 0.8
},
"use_instantid": {
"type": "boolean",
"description": "Use InstantID for face identity",
"default": False
}
},
"required": ["prompt", "reference_images", "model"]
}
),
] ]
...@@ -1123,6 +1277,84 @@ async def call_tool(name: str, arguments: dict) -> list: ...@@ -1123,6 +1277,84 @@ async def call_tool(name: str, arguments: dict) -> list:
output, code = run_videogen_command(args) output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)] return [TextContent(type="text", text=output)]
# Character Consistency Tools
elif name == "videogen_create_character":
args = [
"--create-character", arguments["name"],
]
# Add reference images
for img in arguments["reference_images"][:5]: # Max 5 images
args.extend(["--character-images", img])
if arguments.get("description"):
args.extend(["--character-desc", arguments["description"]])
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_list_characters":
args = ["--list-characters"]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_show_character":
args = ["--show-character", arguments["name"]]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_delete_character":
args = ["--delete-character", arguments["name"]]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_generate_with_character":
args = [
"--model", arguments["model"],
"--character", arguments["character"],
"--prompt", arguments["prompt"],
"--output", arguments.get("output", "output"),
]
# IP-Adapter options
if arguments.get("use_ipadapter", True):
args.append("--ipadapter")
if arguments.get("ipadapter_scale"):
args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
# InstantID options
if arguments.get("use_instantid", False):
args.append("--instantid")
if arguments.get("instantid_scale"):
args.extend(["--instantid-scale", str(arguments["instantid_scale"])])
# Animate for I2V
if arguments.get("animate", False):
args.append("--image_to_video")
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_generate_with_reference":
args = [
"--model", arguments["model"],
"--prompt", arguments["prompt"],
"--output", arguments.get("output", "output"),
]
# Add reference images
for img in arguments["reference_images"]:
args.extend(["--reference-images", img])
# IP-Adapter options
args.append("--ipadapter")
if arguments.get("ipadapter_scale"):
args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
# InstantID options
if arguments.get("use_instantid", False):
args.append("--instantid")
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
else: else:
return [TextContent(type="text", text=f"Unknown tool: {name}")] return [TextContent(type="text", text=f"Unknown tool: {name}")]
......
{ i {
"models": { "models": {
"wan_1.3b_i2v": { "wan_1.3b_i2v": {
"id": "Wan-AI/Wan2.1-I2V-1.3B-Diffusers", "id": "Wan-AI/Wan2.1-I2V-1.3B-Diffusers",
......
...@@ -614,6 +614,136 @@ def delete_output(filename): ...@@ -614,6 +614,136 @@ def delete_output(filename):
return jsonify({'success': True}) return jsonify({'success': True})
return jsonify({'error': 'File not found'}), 404 return jsonify({'error': 'File not found'}), 404
# Character Profile API endpoints
CHARACTERS_DIR = Path.home() / ".config" / "videogen" / "characters"
@app.route('/api/characters', methods=['GET'])
def api_list_characters():
"""List all character profiles"""
characters = []
if CHARACTERS_DIR.exists():
for profile_file in CHARACTERS_DIR.glob("*.json"):
try:
with open(profile_file, 'r') as f:
profile = json.load(f)
characters.append({
'name': profile.get('name', profile_file.stem),
'description': profile.get('description', ''),
'image_count': len(profile.get('reference_images', [])),
'created': profile.get('created', ''),
'tags': profile.get('tags', [])
})
except Exception as e:
print(f"Error loading character profile {profile_file}: {e}")
return jsonify(characters)
@app.route('/api/characters/<name>', methods=['GET'])
def api_get_character(name):
"""Get a specific character profile"""
profile_path = CHARACTERS_DIR / f"{name}.json"
if profile_path.exists():
try:
with open(profile_path, 'r') as f:
return jsonify(json.load(f))
except Exception as e:
return jsonify({'error': str(e)}), 500
return jsonify({'error': 'Character not found'}), 404
@app.route('/api/characters', methods=['POST'])
def api_create_character():
"""Create a new character profile"""
name = request.form.get('name')
description = request.form.get('description', '')
if not name:
return jsonify({'error': 'Name is required'}), 400
# Sanitize name
name = re.sub(r'[^a-zA-Z0-9_-]', '_', name)
# Handle uploaded images
images = request.files.getlist('images')
if not images or len(images) == 0:
return jsonify({'error': 'At least one reference image is required'}), 400
# Create character directory
CHARACTERS_DIR.mkdir(parents=True, exist_ok=True)
char_image_dir = CHARACTERS_DIR / name
char_image_dir.mkdir(parents=True, exist_ok=True)
# Save images
saved_images = []
for i, img in enumerate(images[:5]): # Max 5 images
if img and img.filename:
ext = img.filename.rsplit('.', 1)[-1].lower()
if ext in ALLOWED_EXTENSIONS['image']:
filename = f"reference_{i+1}.{ext}"
filepath = char_image_dir / filename
img.save(filepath)
saved_images.append(str(filepath))
if not saved_images:
return jsonify({'error': 'No valid images uploaded'}), 400
# Create profile
profile = {
'name': name,
'description': description,
'reference_images': saved_images,
'created': datetime.now().isoformat(),
'tags': []
}
# Save profile
profile_path = CHARACTERS_DIR / f"{name}.json"
with open(profile_path, 'w') as f:
json.dump(profile, f, indent=2)
return jsonify(profile)
@app.route('/api/characters/<name>', methods=['DELETE'])
def api_delete_character(name):
"""Delete a character profile"""
profile_path = CHARACTERS_DIR / f"{name}.json"
char_image_dir = CHARACTERS_DIR / name
if not profile_path.exists():
return jsonify({'error': 'Character not found'}), 404
try:
# Delete profile file
profile_path.unlink()
# Delete images directory
if char_image_dir.exists():
shutil.rmtree(char_image_dir)
return jsonify({'success': True})
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/api/upload-multiple', methods=['POST'])
def upload_multiple_files():
"""Upload multiple files (for reference images)"""
files = request.files.getlist('files')
upload_type = request.form.get('type', 'general')
if not files:
return jsonify({'error': 'No files provided'}), 400
saved_paths = []
for f in files:
if f and f.filename:
ext = f.filename.rsplit('.', 1)[-1].lower()
if ext in ALLOWED_EXTENSIONS['image']:
filename = f"{uuid.uuid4().hex[:8]}_{secure_filename(f.filename)}"
filepath = UPLOAD_FOLDER / filename
f.save(filepath)
saved_paths.append(str(filepath))
return jsonify({'paths': saved_paths})
# WebSocket events # WebSocket events
@socketio.on('connect') @socketio.on('connect')
def handle_connect(): def handle_connect():
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment