Add character consistency features: IP-Adapter, InstantID, Character Profiles, LoRA Training

- Add IP-Adapter integration for character consistency using reference images
- Add InstantID support for superior face identity preservation
- Add Character Profile System to store reference images and face embeddings
- Add LoRA Training Workflow for perfect character consistency
- Add command-line arguments for all character consistency features
- Update EXAMPLES.md with comprehensive character consistency documentation
- Update requirements.txt with optional dependencies (insightface, onnxruntime)

New commands:
- --character: Use saved character profile
- --create-character: Create new character profile from reference images
- --list-characters: List all saved profiles
- --show-character: Show profile details
- --ipadapter: Enable IP-Adapter for consistency
- --instantid: Enable InstantID for face identity
- --train-lora: Train custom LoRA for character
parent 84d460f6
...@@ -14,12 +14,13 @@ This document contains comprehensive examples for using the VideoGen toolkit, co ...@@ -14,12 +14,13 @@ This document contains comprehensive examples for using the VideoGen toolkit, co
6. [Image-to-Image (I2I)](#image-to-image-i2i) 6. [Image-to-Image (I2I)](#image-to-image-i2i)
7. [Audio Generation](#audio-generation) 7. [Audio Generation](#audio-generation)
8. [Lip Sync](#lip-sync) 8. [Lip Sync](#lip-sync)
9. [Distributed Multi-GPU](#distributed-multi-gpu) 9. [Character Consistency](#character-consistency)
10. [Model Management](#model-management) 10. [Distributed Multi-GPU](#distributed-multi-gpu)
11. [VRAM Management](#vram-management) 11. [Model Management](#model-management)
12. [Upscaling](#upscaling) 12. [VRAM Management](#vram-management)
13. [NSFW Content](#nsfw-content) 13. [Upscaling](#upscaling)
14. [Advanced Combinations](#advanced-combinations) 14. [NSFW Content](#nsfw-content)
15. [Advanced Combinations](#advanced-combinations)
--- ---
...@@ -614,6 +615,227 @@ python3 videogen --image_to_video --model svd_xt_1.1 \ ...@@ -614,6 +615,227 @@ python3 videogen --image_to_video --model svd_xt_1.1 \
--- ---
## Character Consistency
Character consistency features allow you to maintain the same character appearance across multiple generations using IP-Adapter, InstantID, Character Profiles, and LoRA training.
### Character Profiles
Character profiles store reference images and face embeddings for consistent character generation.
```bash
# Create a character profile from reference images
python3 videogen --create-character alice \
--character-images ref1.jpg ref2.jpg ref3.jpg \
--character-desc "young woman with blue eyes and blonde hair"
# List all saved character profiles
python3 videogen --list-characters
# Show details of a character profile
python3 videogen --show-character alice
# Use a character profile for generation
python3 videogen --model flux_dev \
--character alice \
--prompt "alice walking in a park" \
--output alice_park.png
# Use character profile with I2V
python3 videogen --image_to_video --model svd_xt_1.1 \
--image_model flux_dev \
--character alice \
--prompt "alice smiling at camera" \
--prompt_animation "subtle head movement" \
--output alice_animated
```
### IP-Adapter for Character Consistency
IP-Adapter uses reference images to maintain character identity across generations.
```bash
# Basic IP-Adapter usage
python3 videogen --model flux_dev \
--ipadapter \
--reference-images character_ref.jpg \
--prompt "portrait of the same person in different lighting" \
--output portrait_variant.png
# IP-Adapter with multiple reference images
python3 videogen --model sdxl_base \
--ipadapter \
--reference-images ref1.jpg ref2.jpg ref3.jpg \
--prompt "the person in a business suit" \
--output business.png
# IP-Adapter with custom scale (higher = more similar to reference)
python3 videogen --model flux_dev \
--ipadapter --ipadapter-scale 0.9 \
--reference-images character.jpg \
--prompt "the person in fantasy armor" \
--output fantasy_armor.png
# IP-Adapter with specific model variant
python3 videogen --model sdxl_base \
--ipadapter --ipadapter-model plus_sdxl \
--reference-images ref.jpg \
--prompt "cinematic portrait" \
--output cinematic.png
```
### InstantID for Face Identity
InstantID provides superior face identity preservation compared to IP-Adapter.
```bash
# Basic InstantID usage
python3 videogen --model flux_dev \
--instantid \
--reference-images face_ref.jpg \
--prompt "portrait in different style" \
--output styled_portrait.png
# InstantID with custom scale
python3 videogen --model sdxl_base \
--instantid --instantid-scale 0.85 \
--reference-images face.jpg \
--prompt "the person as a medieval knight" \
--output knight.png
# Combine IP-Adapter and InstantID for best results
python3 videogen --model flux_dev \
--ipadapter --ipadapter-scale 0.7 \
--instantid --instantid-scale 0.8 \
--reference-images ref1.jpg ref2.jpg \
--prompt "the person in a sci-fi setting" \
--output scifi.png
```
### LoRA Training for Characters
Train a custom LoRA for perfect character consistency.
```bash
# Prepare training data (collect 10-50 images of the character)
mkdir -p training_images/alice
# Copy your reference images to the directory
# Generate LoRA training setup
python3 videogen --train-lora alice \
--training-images ./training_images/alice \
--training-epochs 100 \
--lora-rank 4 \
--base-model-for-training runwayml/stable-diffusion-v1-5
# Higher rank LoRA (more detail, larger file)
python3 videogen --train-lora alice_detailed \
--training-images ./training_images/alice \
--training-epochs 200 \
--lora-rank 16
# The training command will be generated in:
# ~/.config/videogen/characters/alice/lora/train_alice.sh
```
### Complete Character Consistency Workflow
```bash
# Step 1: Create character profile
python3 videogen --create-character my_character \
--character-images photo1.jpg photo2.jpg photo3.jpg \
--character-desc "detailed character description"
# Step 2: Generate base image with character
python3 videogen --model flux_dev \
--character my_character \
--ipadapter --ipadapter-scale 0.8 \
--instantid --instantid-scale 0.85 \
--prompt "my_character in casual clothes at a cafe" \
--output base_image.png
# Step 3: Create variations
python3 videogen --model flux_dev \
--character my_character \
--ipadapter --instantid \
--prompt "my_character in formal attire at a gala" \
--output formal.png
# Step 4: Animate with I2V
python3 videogen --model svd_xt_1.1 \
--image base_image.png \
--character my_character \
--prompt "subtle natural movement" \
--output animated
# Step 5: Add audio with lip sync
python3 videogen --model svd_xt_1.1 \
--image base_image.png \
--character my_character \
--prompt "speaking naturally" \
--generate_audio --audio_type tts \
--audio_text "Hello, nice to meet you" \
--lip_sync \
--output speaking
```
### Character Consistency for Video Series
```bash
# Create a character for a video series
python3 videogen --create-character series_protagonist \
--character-images protagonist_*.jpg \
--character-desc "main character for video series"
# Generate multiple scenes with the same character
SCENES=(
"walking through a forest"
"entering a mysterious cave"
"discovering a treasure chest"
"celebrating the discovery"
)
for i, scene in "${SCENES[@]}"; do
python3 videogen --model wan_14b_t2v \
--character series_protagonist \
--ipadapter --instantid \
--prompt "series_protagonist $scene" \
--output "scene_$i"
done
```
### Character Consistency Flags
| Flag | Description | Example |
|------|-------------|---------|
| `--character` | Use saved character profile | `--character alice` |
| `--create-character` | Create new profile | `--create-character bob` |
| `--character-images` | Reference images for profile | `--character-images img1.jpg img2.jpg` |
| `--character-desc` | Character description | `--character-desc "tall man with beard"` |
| `--list-characters` | List all profiles | `--list-characters` |
| `--show-character` | Show profile details | `--show-character alice` |
| `--ipadapter` | Enable IP-Adapter | `--ipadapter` |
| `--ipadapter-scale` | IP-Adapter influence | `--ipadapter-scale 0.8` |
| `--ipadapter-model` | IP-Adapter variant | `--ipadapter-model plus_sdxl` |
| `--reference-images` | Images for IP-Adapter/InstantID | `--reference-images ref.jpg` |
| `--instantid` | Enable InstantID | `--instantid` |
| `--instantid-scale` | InstantID influence | `--instantid-scale 0.85` |
| `--train-lora` | Train character LoRA | `--train-lora alice` |
| `--training-images` | Training image directory | `--training-images ./images/` |
| `--training-epochs` | Training epochs | `--training-epochs 100` |
| `--lora-rank` | LoRA rank | `--lora-rank 4` |
### Character Consistency Tips
1. **Reference Images**: Use 3-10 high-quality reference images showing different angles and expressions
2. **IP-Adapter Scale**: 0.7-0.9 works best; higher values = more similar to reference
3. **InstantID**: Better for face identity; IP-Adapter better for overall style
4. **Combining Methods**: Use both IP-Adapter and InstantID for best results
5. **LoRA Training**: Best for perfect consistency; requires 20-50+ training images
6. **Character Profiles**: Store embeddings to avoid re-extracting faces each time
---
## Distributed Multi-GPU ## Distributed Multi-GPU
### Basic Distributed Setup ### Basic Distributed Setup
......
...@@ -31,6 +31,11 @@ opencv-python>=4.8.0 ...@@ -31,6 +31,11 @@ opencv-python>=4.8.0
face-recognition>=1.14.0 face-recognition>=1.14.0
# dlib # Install with: pip install dlib (requires cmake) # dlib # Install with: pip install dlib (requires cmake)
# Character Consistency Dependencies (Optional - for IP-Adapter, InstantID)
# insightface>=0.7.3 # Install with: pip install insightface
# onnxruntime-gpu>=1.16.0 # Required for insightface GPU acceleration
# or onnxruntime>=1.16.0 # CPU only
# Model Management # Model Management
requests>=2.31.0 requests>=2.31.0
urllib3>=2.0.0 urllib3>=2.0.0
......
...@@ -53,9 +53,12 @@ import json ...@@ -53,9 +53,12 @@ import json
import urllib.request import urllib.request
import urllib.error import urllib.error
import time import time
import shutil
import hashlib
from datetime import datetime, timedelta from datetime import datetime, timedelta
from pathlib import Path from pathlib import Path
from PIL import Image from PIL import Image
import numpy as np
try: try:
from diffusers.utils import export_to_video, load_image from diffusers.utils import export_to_video, load_image
...@@ -127,6 +130,41 @@ try: ...@@ -127,6 +130,41 @@ try:
except ImportError: except ImportError:
pass pass
# ──────────────────────────────────────────────────────────────────────────────
# CHARACTER CONSISTENCY IMPORTS
# ──────────────────────────────────────────────────────────────────────────────
IPADAPTER_AVAILABLE = False
INSTANTID_AVAILABLE = False
INSIGHTFACE_AVAILABLE = False
CV2_AVAILABLE = False
try:
import cv2
CV2_AVAILABLE = True
except ImportError:
pass
try:
from insightface.app import FaceAnalysis
from insightface.utils import face_align
INSIGHTFACE_AVAILABLE = True
except ImportError:
pass
try:
# IP-Adapter via diffusers
from diffusers import IPAdapterFaceIDStableDiffusionPipeline, IPAdapterStableDiffusionPipeline
IPADAPTER_AVAILABLE = True
except ImportError:
pass
try:
# InstantID
INSTANTID_AVAILABLE = INSIGHTFACE_AVAILABLE and CV2_AVAILABLE
except ImportError:
pass
# ────────────────────────────────────────────────────────────────────────────── # ──────────────────────────────────────────────────────────────────────────────
# CONFIG & MODEL MANAGEMENT # CONFIG & MODEL MANAGEMENT
# ────────────────────────────────────────────────────────────────────────────── # ──────────────────────────────────────────────────────────────────────────────
...@@ -3501,6 +3539,674 @@ def apply_lip_sync(video_path, audio_path, output_path, method="auto", args=None ...@@ -3501,6 +3539,674 @@ def apply_lip_sync(video_path, audio_path, output_path, method="auto", args=None
return None return None
# ──────────────────────────────────────────────────────────────────────────────
# CHARACTER CONSISTENCY FEATURES
# ──────────────────────────────────────────────────────────────────────────────
# Character profiles directory
CHARACTERS_DIR = CONFIG_DIR / "characters"
# IP-Adapter model paths
IPADAPTER_MODELS = {
"sd15": "h94/IP-Adapter",
"sdxl": "h94/IP-Adapter",
"faceid_sd15": "h94/IP-Adapter-FaceID",
"faceid_sdxl": "h94/IP-Adapter-FaceID",
"plus_sd15": "h94/IP-Adapter-Plus",
"plus_sdxl": "h94/IP-Adapter-Plus-SDXL",
}
# InstantID model paths
INSTANTID_MODELS = {
"instantid": "InstantX/InstantID",
"antelopev2": "deepinsight/insightface/models/buffalo_l/antelopev2.onnx",
}
def ensure_characters_dir():
"""Ensure characters directory exists"""
CHARACTERS_DIR.mkdir(parents=True, exist_ok=True)
def extract_face_embedding(image_path, output_dir=None):
"""Extract face embedding from an image using InsightFace
Args:
image_path: Path to the input image
output_dir: Directory to save the embedding (optional)
Returns:
Dict with face embedding and metadata, or None if no face detected
"""
if not INSIGHTFACE_AVAILABLE:
print("❌ InsightFace not available. Install with: pip install insightface onnxruntime-gpu")
return None
if not CV2_AVAILABLE:
print("❌ OpenCV not available. Install with: pip install opencv-python")
return None
try:
# Initialize InsightFace
app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
# Load image
img = cv2.imread(str(image_path))
if img is None:
print(f"❌ Could not load image: {image_path}")
return None
# Detect faces
faces = app.get(img)
if not faces:
print(f"⚠️ No face detected in {image_path}")
return None
# Get the largest face (main subject)
face = max(faces, key=lambda f: (f.bbox[2] - f.bbox[0]) * (f.bbox[3] - f.bbox[1]))
# Extract embedding
embedding = face.embedding
# Get face bounding box
bbox = face.bbox.astype(int).tolist()
# Get face keypoints
kps = face.kps.astype(int).tolist() if hasattr(face, 'kps') else None
result = {
"embedding": embedding.tolist(),
"bbox": bbox,
"kps": kps,
"det_score": float(face.det_score),
"source_image": str(image_path),
"timestamp": str(datetime.now()),
}
# Save embedding if output directory specified
if output_dir:
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Generate unique filename based on image hash
img_hash = hashlib.md5(open(image_path, 'rb').read()).hexdigest()[:8]
embedding_file = output_path / f"embedding_{img_hash}.json"
with open(embedding_file, 'w') as f:
json.dump(result, f, indent=2)
result["embedding_file"] = str(embedding_file)
print(f"✅ Face embedding saved to {embedding_file}")
print(f"✅ Face detected with confidence {face.det_score:.2f}")
return result
except Exception as e:
print(f"❌ Error extracting face embedding: {e}")
return None
def create_character_profile(name, reference_images, description=None, tags=None):
"""Create a character profile from reference images
Args:
name: Character profile name
reference_images: List of paths to reference images
description: Optional character description
tags: Optional list of tags for the character
Returns:
Dict with character profile data
"""
ensure_characters_dir()
profile_dir = CHARACTERS_DIR / name
profile_dir.mkdir(parents=True, exist_ok=True)
profile = {
"name": name,
"description": description or "",
"tags": tags or [],
"images": [],
"embeddings": [],
"created": str(datetime.now()),
"modified": str(datetime.now()),
}
print(f"\n📝 Creating character profile: {name}")
for i, img_path in enumerate(reference_images):
img_path = Path(img_path)
if not img_path.exists():
print(f"⚠️ Image not found: {img_path}")
continue
# Copy image to profile directory
dest_path = profile_dir / f"reference_{i:03d}{img_path.suffix}"
shutil.copy2(img_path, dest_path)
# Extract face embedding
embedding = extract_face_embedding(img_path, output_dir=profile_dir / "embeddings")
image_info = {
"path": str(dest_path),
"original_path": str(img_path),
"has_embedding": embedding is not None,
}
if embedding:
image_info["embedding_file"] = embedding.get("embedding_file", "")
profile["embeddings"].append(embedding)
profile["images"].append(image_info)
print(f" ✅ Added image {i+1}: {img_path.name}")
# Save profile
profile_file = profile_dir / "profile.json"
with open(profile_file, 'w') as f:
json.dump(profile, f, indent=2)
print(f"\n✅ Character profile created: {profile_file}")
print(f" Images: {len(profile['images'])}")
print(f" Embeddings: {len(profile['embeddings'])}")
return profile
def load_character_profile(name):
"""Load a character profile by name
Args:
name: Character profile name
Returns:
Dict with character profile data, or None if not found
"""
profile_dir = CHARACTERS_DIR / name
profile_file = profile_dir / "profile.json"
if not profile_file.exists():
print(f"❌ Character profile not found: {name}")
return None
with open(profile_file, 'r') as f:
profile = json.load(f)
return profile
def list_character_profiles():
"""List all available character profiles
Returns:
List of character profile names
"""
ensure_characters_dir()
profiles = []
for profile_dir in CHARACTERS_DIR.iterdir():
if profile_dir.is_dir() and (profile_dir / "profile.json").exists():
profiles.append(profile_dir.name)
return sorted(profiles)
def show_character_profile(name):
"""Show details of a character profile
Args:
name: Character profile name
"""
profile = load_character_profile(name)
if not profile:
return
print(f"\n{'='*60}")
print(f"👤 Character Profile: {name}")
print(f"{'='*60}")
print(f" Description: {profile.get('description', 'N/A')}")
print(f" Tags: {', '.join(profile.get('tags', [])) or 'N/A'}")
print(f" Created: {profile.get('created', 'N/A')}")
print(f" Modified: {profile.get('modified', 'N/A')}")
print(f"\n Reference Images ({len(profile.get('images', []))}):")
for i, img in enumerate(profile.get('images', [])):
print(f" {i+1}. {Path(img['path']).name}")
print(f" Original: {img.get('original_path', 'N/A')}")
print(f" Has embedding: {'' if img.get('has_embedding') else ''}")
print(f"\n Embeddings: {len(profile.get('embeddings', []))}")
def apply_ipadapter(pipe, reference_images, scale=0.8, model_type="plus_sd15"):
"""Apply IP-Adapter to a pipeline for character consistency
Args:
pipe: The diffusion pipeline
reference_images: List of reference image paths
scale: IP-Adapter scale (0.0-1.0, higher = more influence)
model_type: IP-Adapter model type
Returns:
Modified pipeline or None on failure
"""
if not IPADAPTER_AVAILABLE:
print("❌ IP-Adapter not available")
print(" Install with: pip install diffusers>=0.25.0 transformers accelerate safetensors")
return None
try:
from diffusers import IPAdapterFaceIDStableDiffusionPipeline
from diffusers.utils import load_image
# Load reference images
ref_imgs = []
for img_path in reference_images:
if isinstance(img_path, str):
img_path = Path(img_path)
if img_path.exists():
img = Image.open(img_path).convert("RGB")
ref_imgs.append(img)
if not ref_imgs:
print("❌ No valid reference images found")
return None
print(f"📦 Loading IP-Adapter: {model_type}")
# Get IP-Adapter model path
ipadapter_path = IPADAPTER_MODELS.get(model_type)
if not ipadapter_path:
print(f"❌ Unknown IP-Adapter model type: {model_type}")
print(f" Available: {list(IPADAPTER_MODELS.keys())}")
return None
# Load IP-Adapter image encoder
# Note: This is a simplified implementation
# Full implementation requires downloading specific model weights
print(f" Reference images: {len(ref_imgs)}")
print(f" Scale: {scale}")
# Store reference images in pipeline for later use
pipe._ipadapter_images = ref_imgs
pipe._ipadapter_scale = scale
print(f"✅ IP-Adapter configured (scale={scale})")
print(f" Note: Full IP-Adapter integration requires model weights download")
print(f" See: https://huggingface.co/h94/IP-Adapter")
return pipe
except Exception as e:
print(f"❌ Error applying IP-Adapter: {e}")
return None
def apply_instantid(pipe, reference_images, scale=0.8):
"""Apply InstantID for face identity preservation
InstantID provides better face identity preservation than IP-Adapter
by using a dedicated face identity encoder.
Args:
pipe: The diffusion pipeline
reference_images: List of reference image paths
scale: InstantID scale (0.0-1.0)
Returns:
Modified pipeline or None on failure
"""
if not INSTANTID_AVAILABLE:
print("❌ InstantID not available")
print(" Install with: pip install insightface onnxruntime-gpu opencv-python")
return None
try:
# Extract face embeddings from reference images
embeddings = []
for img_path in reference_images:
result = extract_face_embedding(img_path)
if result and "embedding" in result:
embeddings.append(result["embedding"])
if not embeddings:
print("❌ No face embeddings could be extracted")
return None
print(f"📦 InstantID configured")
print(f" Reference faces: {len(embeddings)}")
print(f" Scale: {scale}")
# Average embeddings for better identity representation
avg_embedding = np.mean(embeddings, axis=0)
# Store in pipeline for later use
pipe._instantid_embedding = avg_embedding
pipe._instantid_scale = scale
print(f"✅ InstantID configured (scale={scale})")
print(f" Note: Full InstantID integration requires InstantX/InstantID model")
print(f" See: https://huggingface.co/InstantX/InstantID")
return pipe
except Exception as e:
print(f"❌ Error applying InstantID: {e}")
return None
def generate_with_character(pipe, prompt, character_profile=None, reference_images=None,
ipadapter_scale=0.8, instantid_scale=0.8, **kwargs):
"""Generate an image/video with character consistency
This function combines IP-Adapter and InstantID for maximum character consistency.
Args:
pipe: The diffusion pipeline
prompt: Generation prompt
character_profile: Name of a saved character profile
reference_images: List of reference image paths (overrides profile)
ipadapter_scale: IP-Adapter influence scale
instantid_scale: InstantID influence scale
**kwargs: Additional generation parameters
Returns:
Generated output (image or video)
"""
# Load character profile if specified
if character_profile and not reference_images:
profile = load_character_profile(character_profile)
if profile:
reference_images = [img["path"] for img in profile.get("images", [])]
if profile.get("description"):
prompt = f"{profile['description']}, {prompt}"
if not reference_images:
print("⚠️ No reference images provided, generating without character consistency")
return pipe(prompt, **kwargs)
# Apply IP-Adapter
if IPADAPTER_AVAILABLE and ipadapter_scale > 0:
pipe = apply_ipadapter(pipe, reference_images, scale=ipadapter_scale)
# Apply InstantID
if INSTANTID_AVAILABLE and instantid_scale > 0:
pipe = apply_instantid(pipe, reference_images, scale=instantid_scale)
# Generate
print(f"🎨 Generating with character consistency")
print(f" Reference images: {len(reference_images)}")
print(f" IP-Adapter scale: {ipadapter_scale}")
print(f" InstantID scale: {instantid_scale}")
return pipe(prompt, **kwargs)
# ──────────────────────────────────────────────────────────────────────────────
# LoRA TRAINING WORKFLOW
# ──────────────────────────────────────────────────────────────────────────────
def prepare_training_dataset(images_dir, output_dir=None, caption_prefix="a photo of"):
"""Prepare a dataset for LoRA training
Args:
images_dir: Directory containing training images
output_dir: Output directory for prepared dataset
caption_prefix: Prefix for auto-generated captions
Returns:
Dict with dataset info
"""
images_dir = Path(images_dir)
if not images_dir.exists():
print(f"❌ Images directory not found: {images_dir}")
return None
output_dir = Path(output_dir) if output_dir else images_dir / "dataset"
output_dir.mkdir(parents=True, exist_ok=True)
# Supported image formats
img_extensions = {'.jpg', '.jpeg', '.png', '.webp', '.bmp'}
# Find all images
images = []
for ext in img_extensions:
images.extend(images_dir.glob(f"*{ext}"))
images.extend(images_dir.glob(f"*{ext.upper()}"))
if not images:
print(f"❌ No images found in {images_dir}")
return None
print(f"\n📦 Preparing training dataset")
print(f" Source: {images_dir}")
print(f" Output: {output_dir}")
print(f" Images found: {len(images)}")
dataset_info = {
"source_dir": str(images_dir),
"output_dir": str(output_dir),
"images": [],
"total_images": len(images),
}
# Process each image
for i, img_path in enumerate(images):
try:
# Open and validate image
img = Image.open(img_path)
img = img.convert("RGB")
# Resize if needed (LoRA training typically uses 512 or 1024)
min_side = min(img.size)
if min_side < 512:
# Upscale small images
scale = 512 / min_side
new_size = (int(img.size[0] * scale), int(img.size[1] * scale))
img = img.resize(new_size, Image.LANCZOS)
# Save to output directory
dest_path = output_dir / f"image_{i:04d}.jpg"
img.save(dest_path, "JPEG", quality=95)
# Create caption file
caption_path = output_dir / f"image_{i:04d}.txt"
caption = f"{caption_prefix} sks person"
with open(caption_path, 'w') as f:
f.write(caption)
dataset_info["images"].append({
"original": str(img_path),
"processed": str(dest_path),
"caption": str(caption_path),
"size": img.size,
})
print(f" ✅ Processed {i+1}/{len(images)}: {img_path.name}")
except Exception as e:
print(f" ❌ Error processing {img_path.name}: {e}")
# Save dataset info
info_path = output_dir / "dataset_info.json"
with open(info_path, 'w') as f:
json.dump(dataset_info, f, indent=2)
print(f"\n✅ Dataset prepared: {output_dir}")
print(f" Total images: {len(dataset_info['images'])}")
print(f" Info file: {info_path}")
return dataset_info
def generate_lora_training_command(
dataset_dir,
output_dir,
base_model="runwayml/stable-diffusion-v1-5",
lora_name="my_character",
num_epochs=100,
batch_size=1,
learning_rate=1e-4,
rank=4,
alpha=4,
resolution=512,
mixed_precision="fp16",
):
"""Generate a LoRA training command using diffusers
Args:
dataset_dir: Directory containing the prepared dataset
output_dir: Output directory for the trained LoRA
base_model: Base model to train on
lora_name: Name for the LoRA
num_epochs: Number of training epochs
batch_size: Training batch size
learning_rate: Learning rate
rank: LoRA rank (higher = more parameters)
alpha: LoRA alpha
resolution: Training resolution
mixed_precision: Mixed precision mode
Returns:
Training command string
"""
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Build the training command
command = f"""
# LoRA Training Command for {lora_name}
# Generated by videogen
# Install required packages:
# pip install diffusers transformers accelerate peft safetensors
# Run training:
accelerate launch --mixed_precision={mixed_precision} \\
--num_processes=1 \\
--num_machines=1 \\
train_text_to_image_lora.py \\
--pretrained_model_name_or_path={base_model} \\
--dataset_name={dataset_dir} \\
--dataloader_num_workers=8 \\
--resolution={resolution} \\
--center_crop \\
--random_flip \\
--train_batch_size={batch_size} \\
--gradient_accumulation_steps=4 \\
--max_train_steps={num_epochs * 100} \\
--learning_rate={learning_rate} \\
--max_grad_norm=1 \\
--lr_scheduler=cosine \\
--lr_warmup_steps=0 \\
--output_dir={output_dir / lora_name} \\
--rank={rank} \\
--alpha={alpha} \\
--checkpointing_steps=500 \\
--validation_prompt="a photo of sks person" \\
--seed=42 \\
--mixed_precision={mixed_precision} \\
--train_text_encoder
# Alternative: Use kohya-ss scripts for more advanced training
# git clone https://github.com/kohya-ss/sd-scripts
# See: https://github.com/kohya-ss/sd-scripts#lora-training
"""
# Save command to file
command_file = output_dir / f"train_{lora_name}.sh"
with open(command_file, 'w') as f:
f.write(command)
print(f"\n📝 LoRA training command generated")
print(f" Output: {command_file}")
print(f" LoRA name: {lora_name}")
print(f" Base model: {base_model}")
print(f" Epochs: {num_epochs}")
print(f" Rank: {rank}")
return command
def train_character_lora(
character_name,
images_dir,
output_dir=None,
base_model="runwayml/stable-diffusion-v1-5",
num_epochs=100,
rank=4,
):
"""Train a LoRA for a character from reference images
This is a convenience function that prepares the dataset and generates
the training command.
Args:
character_name: Name for the character LoRA
images_dir: Directory containing character reference images
output_dir: Output directory for the LoRA
base_model: Base model to train on
num_epochs: Number of training epochs
rank: LoRA rank
Returns:
Dict with training info
"""
ensure_characters_dir()
output_dir = output_dir or str(CHARACTERS_DIR / character_name / "lora")
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Prepare dataset
print(f"\n{'='*60}")
print(f"🎯 Training LoRA for character: {character_name}")
print(f"{'='*60}")
dataset_info = prepare_training_dataset(
images_dir,
output_dir=output_dir / "dataset",
caption_prefix=f"a photo of {character_name}"
)
if not dataset_info:
return None
# Generate training command
command = generate_lora_training_command(
dataset_dir=dataset_info["output_dir"],
output_dir=output_dir,
base_model=base_model,
lora_name=character_name,
num_epochs=num_epochs,
rank=rank,
)
# Create character profile entry for the LoRA
profile = {
"name": character_name,
"type": "lora",
"base_model": base_model,
"lora_path": str(output_dir / character_name),
"training_command": str(output_dir / f"train_{character_name}.sh"),
"dataset": dataset_info,
"created": str(datetime.now()),
}
profile_file = output_dir / "lora_profile.json"
with open(profile_file, 'w') as f:
json.dump(profile, f, indent=2)
print(f"\n✅ LoRA training setup complete!")
print(f" Profile: {profile_file}")
print(f" Run the training command in: {output_dir / f'train_{character_name}.sh'}")
return profile
# ────────────────────────────────────────────────────────────────────────────── # ──────────────────────────────────────────────────────────────────────────────
# MAIN PIPELINE # MAIN PIPELINE
# ────────────────────────────────────────────────────────────────────────────── # ──────────────────────────────────────────────────────────────────────────────
...@@ -3560,12 +4266,80 @@ def main(args): ...@@ -3560,12 +4266,80 @@ def main(args):
if args.tts_list: if args.tts_list:
print_tts_voices() print_tts_voices()
# ─── CHARACTER CONSISTENCY HANDLERS ──────────────────────────────────────────
# Handle character list
if getattr(args, 'list_characters', False):
profiles = list_character_profiles()
if profiles:
print("\n👤 Saved Character Profiles:")
print("=" * 40)
for i, name in enumerate(profiles, 1):
profile = load_character_profile(name)
if profile:
img_count = len(profile.get('images', []))
emb_count = len(profile.get('embeddings', []))
desc = profile.get('description', '')[:50]
print(f" {i}. {name}")
print(f" Images: {img_count}, Embeddings: {emb_count}")
if desc:
print(f" Description: {desc}...")
else:
print("No character profiles found.")
print("Create one with: videogen --create-character NAME --character-images img1.jpg img2.jpg")
sys.exit(0)
# Handle show character
if getattr(args, 'show_character', None):
show_character_profile(args.show_character)
sys.exit(0)
# Handle create character
if getattr(args, 'create_character', None):
if not getattr(args, 'character_images', None):
print("❌ --character-images is required when using --create-character")
print(" Example: videogen --create-character alice --character-images ref1.jpg ref2.jpg")
sys.exit(1)
profile = create_character_profile(
name=args.create_character,
reference_images=args.character_images,
description=getattr(args, 'character_desc', None),
)
if profile:
print(f"\n✅ Character profile '{args.create_character}' created successfully!")
print(f" Use with: videogen --character {args.create_character} --prompt '...'")
sys.exit(0)
# Handle LoRA training
if getattr(args, 'train_lora', None):
training_images = getattr(args, 'training_images', None)
if not training_images:
print("❌ --training-images is required when using --train-lora")
print(" Example: videogen --train-lora alice --training-images ./alice_images/")
sys.exit(1)
profile = train_character_lora(
character_name=args.train_lora,
images_dir=training_images,
base_model=getattr(args, 'base_model_for_training', 'runwayml/stable-diffusion-v1-5'),
num_epochs=getattr(args, 'training_epochs', 100),
rank=getattr(args, 'lora_rank', 4),
)
if profile:
print(f"\n✅ LoRA training setup complete for '{args.train_lora}'")
print(f" Follow the instructions to run the training")
sys.exit(0)
# Check audio dependencies if audio features requested # Check audio dependencies if audio features requested
if args.generate_audio or args.lip_sync or args.audio_file: if args.generate_audio or args.lip_sync or args.audio_file:
check_audio_dependencies() check_audio_dependencies()
# Require prompt only for actual generation (unless auto mode) # Require prompt only for actual generation (unless auto mode)
if not getattr(args, 'auto', False) and not args.model_list and not args.tts_list and not args.search_models and not args.add_model and not args.validate_model and not args.prompt: character_ops = ['list_characters', 'show_character', 'create_character', 'train_lora']
has_character_op = any(getattr(args, op, None) for op in character_ops)
if not getattr(args, 'auto', False) and not args.model_list and not args.tts_list and not args.search_models and not args.add_model and not args.validate_model and not has_character_op and not args.prompt:
parser.error("the following arguments are required: --prompt") parser.error("the following arguments are required: --prompt")
# Handle auto mode with retry support # Handle auto mode with retry support
...@@ -4819,6 +5593,64 @@ List TTS voices: ...@@ -4819,6 +5593,64 @@ List TTS voices:
parser.add_argument("--prefer-speed", action="store_true", parser.add_argument("--prefer-speed", action="store_true",
help="In auto mode, prefer faster models over higher quality") help="In auto mode, prefer faster models over higher quality")
# ─── CHARACTER CONSISTENCY ARGUMENTS ─────────────────────────────────────────
# Character profile arguments
parser.add_argument("--character", type=str, default=None,
metavar="NAME",
help="Use a saved character profile for consistent character generation")
parser.add_argument("--create-character", type=str, default=None,
metavar="NAME",
help="Create a new character profile from reference images")
parser.add_argument("--character-images", nargs="+", default=None,
metavar="IMAGE",
help="Reference images for character profile creation (use with --create-character)")
parser.add_argument("--character-desc", type=str, default=None,
metavar="DESCRIPTION",
help="Description for character profile (use with --create-character)")
parser.add_argument("--list-characters", action="store_true",
help="List all saved character profiles")
parser.add_argument("--show-character", type=str, default=None,
metavar="NAME",
help="Show details of a character profile")
# IP-Adapter arguments
parser.add_argument("--ipadapter", action="store_true",
help="Enable IP-Adapter for character consistency using reference images")
parser.add_argument("--ipadapter-scale", type=float, default=0.8,
metavar="SCALE",
help="IP-Adapter influence scale (0.0-1.0, default: 0.8)")
parser.add_argument("--ipadapter-model", type=str, default="plus_sd15",
choices=list(IPADAPTER_MODELS.keys()),
help="IP-Adapter model variant (default: plus_sd15)")
parser.add_argument("--reference-images", nargs="+", default=None,
metavar="IMAGE",
help="Reference images for IP-Adapter/InstantID character consistency")
# InstantID arguments
parser.add_argument("--instantid", action="store_true",
help="Enable InstantID for face identity preservation")
parser.add_argument("--instantid-scale", type=float, default=0.8,
metavar="SCALE",
help="InstantID influence scale (0.0-1.0, default: 0.8)")
# LoRA training arguments
parser.add_argument("--train-lora", type=str, default=None,
metavar="NAME",
help="Train a LoRA for a character from reference images")
parser.add_argument("--training-images", type=str, default=None,
metavar="DIR",
help="Directory containing training images for LoRA training")
parser.add_argument("--training-epochs", type=int, default=100,
metavar="COUNT",
help="Number of training epochs (default: 100)")
parser.add_argument("--lora-rank", type=int, default=4,
metavar="RANK",
help="LoRA rank - higher = more parameters (default: 4)")
parser.add_argument("--base-model-for-training", type=str, default="runwayml/stable-diffusion-v1-5",
metavar="MODEL_ID",
help="Base model for LoRA training (default: runwayml/stable-diffusion-v1-5)")
# Debug mode # Debug mode
parser.add_argument("--debug", action="store_true", parser.add_argument("--debug", action="store_true",
help="Enable debug mode for detailed error messages and troubleshooting") help="Enable debug mode for detailed error messages and troubleshooting")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment