Add character consistency features, fix model loading for non-diffusers models

- Add character profile management (create, list, show, delete) - Add IP-Adapter and InstantID support for character consistency - Fix model loading for models with config.json only (no model_index.json) - Add component-only model detection (fine-tuned weights) - Update MCP server with character consistency tools - Update SKILL.md and README.md documentation - Add memory management for dubbing/translation - Add chunked processing for Whisper transcription - Add character persistency options to web interface

Add character consistency features, fix model loading for non-diffusers models
- Add character profile management (create, list, show, delete) - Add IP-Adapter and InstantID support for character consistency - Fix model loading for models with config.json only (no model_index.json) - Add component-only model detection (fine-tuned weights) - Update MCP server with character consistency tools - Update SKILL.md and README.md documentation - Add memory management for dubbing/translation - Add chunked processing for Whisper transcription - Add character persistency options to web interface
1f5226ed · Stefy Lanza (nextime / spora ) · 627eb38f · 1f5226ed · 1f5226ed · 1f5226ed
Commit 1f5226ed authored Feb 25, 2026 by Stefy Lanza (nextime / spora )
9 changed files
--- a/README.md
+++ b/README.md
@@ -47,6 +47,12 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
 - **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo
 - **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina

+### Character Consistency
+- **Character Profiles**: Save and reuse character references across generations
+- **IP-Adapter**: Image prompt adapter for consistent character generation
+- **InstantID**: Face identity preservation for consistent faces
+- **Reference Images**: Use multiple reference images for character consistency
+
 ### Smart Features
 - **Auto Mode**: Automatic model selection and configuration
 - **NSFW Detection**: Automatic content classification
@@ -54,6 +60,7 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
 - **Time Estimation**: Hardware-aware generation time prediction
 - **Multi-GPU**: Distributed generation across multiple GPUs
 - **Auto-Disable**: Models that fail 3 times are auto-disabled
+- **Memory Management**: Automatic chunking for long videos and low VRAM

 ### User Interfaces
 - **Command Line**: Full-featured CLI with all options
@@ -147,6 +154,40 @@ python3 videogen --image_to_video --model svd_xt_1.1 \
  --lip_sync --output speaker
 ```

+### Character Consistency
+
+VideoGen supports character consistency across multiple generations using IP-Adapter and InstantID.
+
+```bash
+# Create a character profile from reference images
+python3 videogen --create-character my_character \
+  --character-images ref1.jpg ref2.jpg ref3.jpg \
+  --character-desc "A young woman with red hair"
+
+# List saved character profiles
+python3 videogen --list-characters
+
+# Generate with character consistency
+python3 videogen --model flux_dev \
+  --character my_character \
+  --prompt "my_character walking in a park" \
+  --output character_park
+
+# Use IP-Adapter directly with reference images
+python3 videogen --model sdxl_base \
+  --ipadapter --ipadapter-scale 0.8 \
+  --reference-images ref1.jpg ref2.jpg \
+  --prompt "a person reading a book" \
+  --output reading
+
+# Use InstantID for face consistency
+python3 videogen --model sdxl_base \
+  --ipadapter --instantid \
+  --reference-images face_ref.jpg \
+  --prompt "portrait of a person smiling" \
+  --output portrait
+```
+
 ---

 ## AI Agent Integration

--- a/SKILL.md
+++ b/SKILL.md
@@ -341,6 +341,74 @@ python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge

 ---

+## Character Consistency
+
+VideoGen supports character consistency across multiple generations using IP-Adapter, InstantID, and Character Profiles.
+
+### Create Character Profile
+
+```bash
+# Create a character profile from reference images
+python3 videogen --create-character alice \
+  --character-images ref1.jpg ref2.jpg ref3.jpg \
+  --character-desc "young woman with blue eyes and blonde hair"
+
+# List all saved character profiles
+python3 videogen --list-characters
+
+# Show details of a character profile
+python3 videogen --show-character alice
+
+# Delete a character profile
+python3 videogen --delete-character alice
+```
+
+### Generate with Character
+
+```bash
+# Generate image with character consistency
+python3 videogen --model flux_dev \
+  --character alice \
+  --prompt "alice walking in a park" \
+  --output alice_park.png
+
+# Generate video with character (I2V)
+python3 videogen --image_to_video --model svd_xt_1.1 \
+  --image_model flux_dev \
+  --character alice \
+  --prompt "alice smiling at camera" \
+  --prompt_animation "subtle head movement" \
+  --output alice_animated
+```
+
+### IP-Adapter Direct Usage
+
+```bash
+# Use IP-Adapter with reference images directly
+python3 videogen --model flux_dev \
+  --ipadapter --ipadapter-scale 0.8 \
+  --reference-images ref1.jpg ref2.jpg \
+  --prompt "the person in a business suit" \
+  --output business.png
+
+# Use InstantID for face identity
+python3 videogen --model flux_dev \
+  --ipadapter --instantid \
+  --reference-images face_ref.jpg \
+  --prompt "portrait of the person smiling" \
+  --output portrait.png
+```
+
+### Character Consistency Tips
+
+1. **Use multiple reference images** (3-5) for better consistency
+2. **IP-Adapter scale**: 0.7-0.9 for good balance (higher = more similar)
+3. **InstantID** is better for face identity preservation
+4. **Character profiles** are reusable across sessions
+5. **Combine IP-Adapter + InstantID** for best results
+
+---
+
 ## Output Files

 VideoGen creates these output files:

--- a/static/css/style.css
+++ b/static/css/style.css
@@ -934,3 +934,138 @@ body {
 .fa-spin {
    animation: spin 1s linear infinite;
 }
+/* Character Consistency Styles */
+
+/* File Upload Box */
+.file-upload {
+    position: relative;
+    border: 2px dashed var(--border-color);
+    border-radius: var(--border-radius);
+    padding: 2rem;
+    text-align: center;
+    transition: all 0.3s;
+    cursor: pointer;
+}
+
+.file-upload:hover {
+    border-color: var(--primary);
+    background: rgba(99, 102, 241, 0.05);
+}
+
+.file-upload input[type="file"] {
+    position: absolute;
+    top: 0;
+    left: 0;
+    width: 100%;
+    height: 100%;
+    opacity: 0;
+    cursor: pointer;
+}
+
+.file-upload .file-label {
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+    gap: 0.5rem;
+    pointer-events: none;
+}
+
+.file-upload .file-label i {
+    font-size: 2rem;
+    color: var(--primary);
+}
+
+.file-upload .file-label span {
+    font-weight: 500;
+}
+
+.file-upload .file-label small {
+    color: var(--text-muted);
+    font-size: 0.8rem;
+}
+
+/* Image Preview Grid */
+.image-preview-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fill, minmax(100px, 1fr));
+    gap: 0.75rem;
+    margin-top: 1rem;
+}
+
+.preview-item {
+    position: relative;
+    aspect-ratio: 1;
+    border-radius: var(--border-radius);
+    overflow: hidden;
+    background: var(--bg-darker);
+}
+
+.preview-item img {
+    width: 100%;
+    height: 100%;
+    object-fit: cover;
+}
+
+.preview-item .remove-btn {
+    position: absolute;
+    top: 4px;
+    right: 4px;
+    width: 24px;
+    height: 24px;
+    border-radius: 50%;
+    background: rgba(0, 0, 0, 0.7);
+    color: white;
+    border: none;
+    cursor: pointer;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    font-size: 0.7rem;
+    opacity: 0;
+    transition: opacity 0.2s;
+}
+
+.preview-item:hover .remove-btn {
+    opacity: 1;
+}
+
+.preview-item .remove-btn:hover {
+    background: var(--danger);
+}
+
+/* Character Section */
+#character-section {
+    border-left: 3px solid var(--primary);
+}
+
+#character-section h3 {
+    color: var(--primary-light);
+}
+
+/* IP-Adapter and InstantID Options */
+#ipadapter-options,
+#instantid-options,
+#character-profile-options {
+    margin-top: 1rem;
+    padding-top: 1rem;
+    border-top: 1px solid var(--border-color);
+}
+
+/* Range Slider with Value Display */
+.form-group input[type="range"] {
+    width: calc(100% - 50px);
+    vertical-align: middle;
+}
+
+.form-group input[type="range"] + span {
+    display: inline-block;
+    width: 40px;
+    text-align: right;
+    font-weight: 600;
+    color: var(--primary);
+}
+
+/* Hidden class */
+.hidden {
+    display: none !important;
+}
--- a/static/js/app.js
+++ b/static/js/app.js
@@ -310,6 +310,15 @@ async function handleGenerate(e) {
    params.translate_subtitles = form.querySelector('#translate_subtitles')?.checked || false;
    params.burn_subtitles = form.querySelector('#burn_subtitles')?.checked || false;
    
+    // Character consistency options
+    params.use_character = form.querySelector('#use_character')?.checked || false;
+    params.use_ipadapter = form.querySelector('#use_ipadapter')?.checked || false;
+    params.use_instantid = form.querySelector('#use_instantid')?.checked || false;
+    params.ipadapter_scale = parseFloat(form.querySelector('#ipadapter_scale')?.value) || 0.8;
+    params.instantid_scale = parseFloat(form.querySelector('#instantid_scale')?.value) || 0.8;
+    params.ipadapter_type = form.querySelector('#ipadapter_type')?.value || 'plus_sd15';
+    params.character_profile = form.querySelector('#character_profile')?.value || '';
+    
    // Convert numeric values
    params.width = parseInt(params.width) || 832;
    params.height = parseInt(params.height) || 480;
@@ -794,3 +803,291 @@ document.addEventListener('keydown', (e) => {
        document.getElementById('generate-form').dispatchEvent(new Event('submit'));
    }
 });
+// Character Consistency Functions
+
+// Toggle character profile options
+function toggleCharacterOptions() {
+    const checkbox = document.getElementById('use_character');
+    const options = document.getElementById('character-profile-options');
+    
+    if (checkbox.checked) {
+        options.classList.remove('hidden');
+        loadCharacterProfiles();
+    } else {
+        options.classList.add('hidden');
+    }
+}
+
+// Toggle IP-Adapter options
+function toggleIPAdapterOptions() {
+    const checkbox = document.getElementById('use_ipadapter');
+    const options = document.getElementById('ipadapter-options');
+    const instantidOptions = document.getElementById('instantid-options');
+    const instantidCheckbox = document.getElementById('use_instantid');
+    
+    if (checkbox.checked) {
+        options.classList.remove('hidden');
+        if (instantidCheckbox.checked) {
+            instantidOptions.classList.remove('hidden');
+        }
+    } else {
+        options.classList.add('hidden');
+        // Also hide InstantID options if IP-Adapter is disabled
+        instantidOptions.classList.add('hidden');
+        instantidCheckbox.checked = false;
+    }
+}
+
+// Toggle InstantID options
+function toggleInstantIDOptions() {
+    const checkbox = document.getElementById('use_instantid');
+    const options = document.getElementById('instantid-options');
+    const ipadapterCheckbox = document.getElementById('use_ipadapter');
+    
+    if (checkbox.checked) {
+        // Require IP-Adapter for InstantID
+        if (!ipadapterCheckbox.checked) {
+            ipadapterCheckbox.checked = true;
+            toggleIPAdapterOptions();
+        }
+        options.classList.remove('hidden');
+    } else {
+        options.classList.add('hidden');
+    }
+}
+
+// Load character profiles from API
+async function loadCharacterProfiles() {
+    try {
+        const response = await fetch('/api/characters');
+        const characters = await response.json();
+        
+        const select = document.getElementById('character_profile');
+        select.innerHTML = '<option value="">Select a character...</option>';
+        
+        characters.forEach(char => {
+            const option = document.createElement('option');
+            option.value = char.name;
+            option.textContent = `${char.name} (${char.image_count} images)`;
+            option.dataset.description = char.description || '';
+            select.appendChild(option);
+        });
+    } catch (error) {
+        console.error('Error loading characters:', error);
+        showToast('Failed to load character profiles', 'error');
+    }
+}
+
+// Handle reference image upload
+async function handleReferenceUpload(input) {
+    const files = input.files;
+    if (!files || files.length === 0) return;
+    
+    const preview = document.getElementById('reference-preview');
+    preview.innerHTML = '';
+    
+    // Show preview of selected images
+    for (let i = 0; i < Math.min(files.length, 5); i++) {
+        const file = files[i];
+        const reader = new FileReader();
+        
+        reader.onload = function(e) {
+            const div = document.createElement('div');
+            div.className = 'preview-item';
+            div.innerHTML = `
+                <img src="${e.target.result}" alt="Reference ${i + 1}">
+                <button type="button" class="remove-btn" onclick="removeReferenceImage(this)">
+                    <i class="fas fa-times"></i>
+                </button>
+            `;
+            preview.appendChild(div);
+        };
+        
+        reader.readAsDataURL(file);
+    }
+    
+    // Upload files to server
+    const formData = new FormData();
+    for (let i = 0; i < files.length; i++) {
+        formData.append('files', files[i]);
+    }
+    formData.append('type', 'reference');
+    
+    try {
+        const response = await fetch('/api/upload-multiple', {
+            method: 'POST',
+            body: formData
+        });
+        
+        const data = await response.json();
+        
+        if (response.ok) {
+            // Store paths in hidden input
+            const hiddenInput = document.getElementById('input_reference');
+            if (hiddenInput) {
+                hiddenInput.value = JSON.stringify(data.paths);
+            }
+            showToast(`Uploaded ${files.length} reference images`, 'success');
+        } else {
+            showToast(`Upload failed: ${data.error}`, 'error');
+        }
+    } catch (error) {
+        console.error('Upload error:', error);
+        showToast('Upload failed', 'error');
+    }
+}
+
+// Remove reference image from preview
+function removeReferenceImage(btn) {
+    const item = btn.closest('.preview-item');
+    item.remove();
+    
+    // Update hidden input
+    const preview = document.getElementById('reference-preview');
+    const remaining = preview.querySelectorAll('.preview-item');
+    
+    if (remaining.length === 0) {
+        const hiddenInput = document.getElementById('input_reference');
+        if (hiddenInput) {
+            hiddenInput.value = '';
+        }
+    }
+}
+
+// Show create character modal
+function showCreateCharacterModal() {
+    const modal = document.getElementById('create-character-modal');
+    if (modal) {
+        modal.classList.add('active');
+    } else {
+        // Create modal dynamically
+        const modalHtml = `
+            <div class="modal active" id="create-character-modal">
+                <div class="modal-content">
+                    <div class="modal-header">
+                        <h3><i class="fas fa-user-plus"></i> Create Character Profile</h3>
+                        <button class="close-btn" onclick="closeCreateCharacterModal()">
+                            <i class="fas fa-times"></i>
+                        </button>
+                    </div>
+                    <div class="modal-body">
+                        <form id="create-character-form" onsubmit="createCharacterProfile(event)">
+                            <div class="form-group">
+                                <label for="char_name">Character Name</label>
+                                <input type="text" id="char_name" name="name" required placeholder="e.g., my_character">
+                            </div>
+                            <div class="form-group">
+                                <label for="char_desc">Description (optional)</label>
+                                <textarea id="char_desc" name="description" rows="2" placeholder="Brief description of the character..."></textarea>
+                            </div>
+                            <div class="form-group">
+                                <label>Reference Images</label>
+                                <div class="file-upload">
+                                    <input type="file" id="char_images" name="images" accept="image/*" multiple required>
+                                    <label for="char_images" class="file-label">
+                                        <i class="fas fa-images"></i>
+                                        <span>Upload Reference Images</span>
+                                        <small>Select 1-5 images</small>
+                                    </label>
+                                </div>
+                                <div id="char-image-preview" class="image-preview-grid"></div>
+                            </div>
+                            <div class="form-actions">
+                                <button type="submit" class="btn btn-primary">
+                                    <i class="fas fa-save"></i> Create Profile
+                                </button>
+                                <button type="button" class="btn btn-secondary" onclick="closeCreateCharacterModal()">
+                                    Cancel
+                                </button>
+                            </div>
+                        </form>
+                    </div>
+                </div>
+            </div>
+        `;
+        
+        document.body.insertAdjacentHTML('beforeend', modalHtml);
+        
+        // Add preview handler
+        document.getElementById('char_images').addEventListener('change', function() {
+            const preview = document.getElementById('char-image-preview');
+            preview.innerHTML = '';
+            
+            for (let i = 0; i < Math.min(this.files.length, 5); i++) {
+                const file = this.files[i];
+                const reader = new FileReader();
+                
+                reader.onload = function(e) {
+                    const div = document.createElement('div');
+                    div.className = 'preview-item';
+                    div.innerHTML = `<img src="${e.target.result}" alt="Preview ${i + 1}">`;
+                    preview.appendChild(div);
+                };
+                
+                reader.readAsDataURL(file);
+            }
+        });
+    }
+}
+
+// Close create character modal
+function closeCreateCharacterModal() {
+    const modal = document.getElementById('create-character-modal');
+    if (modal) {
+        modal.classList.remove('active');
+        modal.remove();
+    }
+}
+
+// Create character profile
+async function createCharacterProfile(e) {
+    e.preventDefault();
+    
+    const form = e.target;
+    const formData = new FormData(form);
+    
+    try {
+        const response = await fetch('/api/characters', {
+            method: 'POST',
+            body: formData
+        });
+        
+        const data = await response.json();
+        
+        if (response.ok) {
+            showToast(`Character profile "${data.name}" created`, 'success');
+            closeCreateCharacterModal();
+            loadCharacterProfiles();
+        } else {
+            showToast(`Failed to create profile: ${data.error}`, 'error');
+        }
+    } catch (error) {
+        console.error('Error creating character:', error);
+        showToast('Failed to create character profile', 'error');
+    }
+}
+
+// Setup slider value displays
+document.addEventListener('DOMContentLoaded', () => {
+    // IP-Adapter scale slider
+    const ipadapterSlider = document.getElementById('ipadapter_scale');
+    if (ipadapterSlider) {
+        ipadapterSlider.addEventListener('input', (e) => {
+            document.getElementById('ipadapter-scale-value').textContent = e.target.value;
+        });
+    }
+    
+    // InstantID scale slider
+    const instantidSlider = document.getElementById('instantid_scale');
+    if (instantidSlider) {
+        instantidSlider.addEventListener('input', (e) => {
+            document.getElementById('instantid-scale-value').textContent = e.target.value;
+        });
+    }
+    
+    // InstantID checkbox handler
+    const instantidCheckbox = document.getElementById('use_instantid');
+    if (instantidCheckbox) {
+        instantidCheckbox.addEventListener('change', toggleInstantIDOptions);
+    }
+});
--- a/templates/index.html
+++ b/templates/index.html
@@ -327,6 +327,85 @@
                            </div>
                        </div>

+                        <!-- Character Consistency -->
+                        <div class="form-section" id="character-section">
+                            <h3><i class="fas fa-user-circle"></i> Character Consistency</h3>
+                            <div class="form-row">
+                                <label class="checkbox-label">
+                                    <input type="checkbox" id="use_character" name="use_character" onchange="toggleCharacterOptions()">
+                                    <span>Use Character Profile</span>
+                                </label>
+                                <label class="checkbox-label">
+                                    <input type="checkbox" id="use_ipadapter" name="use_ipadapter" onchange="toggleIPAdapterOptions()">
+                                    <span>IP-Adapter</span>
+                                </label>
+                                <label class="checkbox-label">
+                                    <input type="checkbox" id="use_instantid" name="use_instantid">
+                                    <span>InstantID (Face)</span>
+                                </label>
+                            </div>
+                            
+                            <!-- Character Profile Selection -->
+                            <div id="character-profile-options" class="hidden">
+                                <div class="form-row">
+                                    <div class="form-group">
+                                        <label for="character_profile">Character Profile</label>
+                                        <select id="character_profile" name="character_profile">
+                                            <option value="">Select a character...</option>
+                                        </select>
+                                    </div>
+                                    <div class="form-group">
+                                        <button type="button" class="btn btn-secondary" onclick="showCreateCharacterModal()">
+                                            <i class="fas fa-plus"></i> New Character
+                                        </button>
+                                    </div>
+                                </div>
+                            </div>
+                            
+                            <!-- IP-Adapter Options -->
+                            <div id="ipadapter-options" class="hidden">
+                                <div class="form-row">
+                                    <div class="form-group">
+                                        <label for="ipadapter_scale">IP-Adapter Scale</label>
+                                        <input type="range" id="ipadapter_scale" name="ipadapter_scale" value="0.8" min="0.0" max="1.0" step="0.1">
+                                        <span id="ipadapter-scale-value">0.8</span>
+                                    </div>
+                                    <div class="form-group">
+                                        <label for="ipadapter_type">IP-Adapter Type</label>
+                                        <select id="ipadapter_type" name="ipadapter_type">
+                                            <option value="plus_sd15">Plus (SD 1.5)</option>
+                                            <option value="plus_sdxl">Plus (SDXL)</option>
+                                            <option value="faceid_sd15">FaceID (SD 1.5)</option>
+                                            <option value="faceid_sdxl">FaceID (SDXL)</option>
+                                        </select>
+                                    </div>
+                                </div>
+                                <div class="form-group">
+                                    <label>Reference Images for IP-Adapter</label>
+                                    <div class="file-upload" id="reference-upload-box">
+                                        <input type="file" id="reference_images" name="reference_images" accept="image/*" multiple onchange="handleReferenceUpload(this)">
+                                        <label for="reference_images" class="file-label">
+                                            <i class="fas fa-images"></i>
+                                            <span>Upload Reference Images</span>
+                                            <small>Select 1-5 images</small>
+                                        </label>
+                                    </div>
+                                    <div id="reference-preview" class="image-preview-grid"></div>
+                                </div>
+                            </div>
+                            
+                            <!-- InstantID Options -->
+                            <div id="instantid-options" class="hidden">
+                                <div class="form-row">
+                                    <div class="form-group">
+                                        <label for="instantid_scale">InstantID Scale</label>
+                                        <input type="range" id="instantid_scale" name="instantid_scale" value="0.8" min="0.0" max="1.0" step="0.1">
+                                        <span id="instantid-scale-value">0.8</span>
+                                    </div>
+                                </div>
+                            </div>
+                        </div>
+
                        <!-- Advanced Options -->
                        <div class="form-section collapsible">
                            <h3 onclick="toggleSection(this)">

--- a/videogen
+++ b/videogen
@@ -137,6 +137,391 @@ try:
 except ImportError:
    pass

+# ──────────────────────────────────────────────────────────────────────────────
+#                           MEMORY MANAGEMENT UTILITIES
+# ──────────────────────────────────────────────────────────────────────────────
+
+import gc
+
+def clear_memory(clear_cuda=True, aggressive=False):
+    """Clear memory to prevent OOM on long operations
+    
+    Args:
+        clear_cuda: Whether to clear CUDA cache
+        aggressive: If True, also run Python garbage collection multiple times
+    """
+    if clear_cuda and torch.cuda.is_available():
+        torch.cuda.empty_cache()
+        torch.cuda.synchronize()
+        if aggressive:
+            # Reset peak memory stats
+            torch.cuda.reset_peak_memory_stats()
+    
+    # Run garbage collection
+    gc.collect()
+    if aggressive:
+        gc.collect()
+        gc.collect()
+
+
+def get_memory_usage():
+    """Get current memory usage statistics
+    
+    Returns:
+        dict with memory usage info
+    """
+    result = {
+        "ram_used_gb": 0,
+        "ram_total_gb": 0,
+        "ram_percent": 0,
+        "vram_used_gb": 0,
+        "vram_total_gb": 0,
+        "vram_percent": 0,
+    }
+    
+    # RAM usage
+    try:
+        mem = psutil.virtual_memory()
+        result["ram_used_gb"] = mem.used / (1024**3)
+        result["ram_total_gb"] = mem.total / (1024**3)
+        result["ram_percent"] = mem.percent
+    except:
+        pass
+    
+    # VRAM usage
+    if torch.cuda.is_available():
+        try:
+            vram_allocated = torch.cuda.memory_allocated() / (1024**3)
+            vram_reserved = torch.cuda.memory_reserved() / (1024**3)
+            vram_total = torch.cuda.get_device_properties(0).total_memory / (1024**3)
+            result["vram_used_gb"] = vram_allocated
+            result["vram_reserved_gb"] = vram_reserved
+            result["vram_total_gb"] = vram_total
+            result["vram_percent"] = (vram_allocated / vram_total) * 100 if vram_total > 0 else 0
+        except:
+            pass
+    
+    return result
+
+
+def check_memory_available(required_vram_gb=2.0, required_ram_gb=2.0):
+    """Check if enough memory is available
+    
+    Args:
+        required_vram_gb: Required VRAM in GB
+        required_ram_gb: Required RAM in GB
+    
+    Returns:
+        tuple: (vram_ok, ram_ok, memory_info)
+    """
+    mem = get_memory_usage()
+    
+    vram_available = mem["vram_total_gb"] - mem["vram_used_gb"]
+    ram_available = mem["ram_total_gb"] - mem["ram_used_gb"]
+    
+    vram_ok = vram_available >= required_vram_gb
+    ram_ok = ram_available >= required_ram_gb
+    
+    return vram_ok, ram_ok, mem
+
+
+def should_chunk_video(video_duration, video_resolution, vram_gb):
+    """Determine if video should be processed in chunks
+    
+    Args:
+        video_duration: Duration in seconds
+        video_resolution: Tuple of (width, height)
+        vram_gb: Available VRAM in GB
+    
+    Returns:
+        tuple: (should_chunk, chunk_duration, reason)
+    """
+    width, height = video_resolution
+    pixels = width * height
+    
+    # Base chunk duration on resolution and VRAM
+    # Higher resolution = smaller chunks
+    # Less VRAM = smaller chunks
+    
+    if pixels >= 7680 * 4320:  # 8K
+        base_chunk = 30
+    elif pixels >= 3840 * 2160:  # 4K
+        base_chunk = 60
+    elif pixels >= 1920 * 1080:  # 1080p
+        base_chunk = 120
+    elif pixels >= 1280 * 720:  # 720p
+        base_chunk = 180
+    else:
+        base_chunk = 300
+    
+    # Adjust for VRAM
+    vram_factor = min(1.0, vram_gb / 16.0)  # 16GB as baseline
+    chunk_duration = int(base_chunk * vram_factor)
+    
+    # Minimum chunk duration
+    chunk_duration = max(30, chunk_duration)
+    
+    # Decide if chunking is needed
+    should_chunk = video_duration > chunk_duration * 1.5
+    
+    if should_chunk:
+        reason = f"Video duration ({video_duration:.0f}s) > chunk size ({chunk_duration}s) for {width}x{height} @ {vram_gb:.0f}GB VRAM"
+    else:
+        reason = f"Video can be processed in one pass ({video_duration:.0f}s)"
+    
+    return should_chunk, chunk_duration, reason
+
+
+def extract_audio_chunk(video_path, start_time, duration, output_path):
+    """Extract a chunk of audio from video
+    
+    Args:
+        video_path: Path to video file
+        start_time: Start time in seconds
+        duration: Duration in seconds
+        output_path: Path to save audio chunk
+    
+    Returns:
+        Path to extracted audio chunk or None on failure
+    """
+    cmd = [
+        'ffmpeg', '-y',
+        '-ss', str(start_time),
+        '-i', video_path,
+        '-t', str(duration),
+        '-vn',  # No video
+        '-acodec', 'pcm_s16le',  # WAV format
+        '-ar', '16000',  # 16kHz sample rate (optimal for Whisper)
+        '-ac', '1',  # Mono
+        output_path
+    ]
+    
+    try:
+        result = subprocess.run(cmd, capture_output=True, text=True)
+        if result.returncode == 0 and os.path.exists(output_path):
+            return output_path
+        return None
+    except Exception as e:
+        print(f"  ⚠️ Audio chunk extraction failed: {e}")
+        return None
+
+
+def get_video_info(video_path):
+    """Get video information (duration, resolution, fps)
+    
+    Args:
+        video_path: Path to video file
+    
+    Returns:
+        dict with video info or None on failure
+    """
+    try:
+        # Get duration
+        duration_result = subprocess.run(
+            ['ffprobe', '-v', 'error', '-show_entries', 'format=duration',
+             '-of', 'default=noprint_wrappers=1:nokey=1', video_path],
+            capture_output=True, text=True
+        )
+        duration = float(duration_result.stdout.strip())
+        
+        # Get resolution
+        resolution_result = subprocess.run(
+            ['ffprobe', '-v', 'error', '-select_streams', 'v:0',
+             '-show_entries', 'stream=width,height',
+             '-of', 'csv=s=x:p=0', video_path],
+            capture_output=True, text=True
+        )
+        width, height = map(int, resolution_result.stdout.strip().split('x'))
+        
+        # Get fps
+        fps_result = subprocess.run(
+            ['ffprobe', '-v', 'error', '-select_streams', 'v:0',
+             '-show_entries', 'stream=r_frame_rate',
+             '-of', 'default=noprint_wrappers=1:nokey=1', video_path],
+            capture_output=True, text=True
+        )
+        fps_parts = fps_result.stdout.strip().split('/')
+        fps = float(fps_parts[0]) / float(fps_parts[1]) if len(fps_parts) == 2 else float(fps_parts[0])
+        
+        return {
+            "duration": duration,
+            "width": width,
+            "height": height,
+            "resolution": (width, height),
+            "fps": fps,
+        }
+    except Exception as e:
+        print(f"  ⚠️ Could not get video info: {e}")
+        return None
+
+
+class ModelManager:
+    """Context manager for model lifecycle management
+    
+    Ensures models are properly unloaded after use to prevent memory leaks.
+    
+    Usage:
+        with ModelManager("Whisper", model_size="base") as model:
+            result = model.transcribe(audio_path)
+    """
+    
+    _loaded_models = {}  # Track loaded models to avoid reloading
+    
+    def __init__(self, model_type, device=None, **kwargs):
+        self.model_type = model_type
+        self.device = device
+        self.kwargs = kwargs
+        self.model = None
+        self._model_key = f"{model_type}_{kwargs.get('model_size', '')}"
+    
+    def __enter__(self):
+        # Check if model is already loaded
+        if self._model_key in self._loaded_models:
+            return self._loaded_models[self._model_key]
+        
+        print(f"  📦 Loading {self.model_type} model...")
+        mem_before = get_memory_usage()
+        
+        try:
+            if self.model_type == "Whisper":
+                model_size = self.kwargs.get("model_size", "base")
+                self.model = whisper.load_model(model_size, device=self.device)
+            elif self.model_type == "MarianMT":
+                model_name = self.kwargs.get("model_name")
+                self.model = {
+                    "model": MarianMTModel.from_pretrained(model_name),
+                    "tokenizer": MarianTokenizer.from_pretrained(model_name),
+                }
+            elif self.model_type == "MusicGen":
+                model_size = self.kwargs.get("model_size", "medium")
+                self.model = MusicGen.get_pretrained(f"facebook/musicgen-{model_size}")
+                if self.device:
+                    self.model.to(self.device)
+            
+            # Cache the model
+            self._loaded_models[self._model_key] = self.model
+            
+            mem_after = get_memory_usage()
+            vram_used = mem_after["vram_used_gb"] - mem_before["vram_used_gb"]
+            print(f"     Model loaded (VRAM: +{vram_used:.2f}GB)")
+            
+            return self.model
+            
+        except Exception as e:
+            print(f"  ❌ Failed to load {self.model_type}: {e}")
+            raise
+    
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        # Don't unload cached models - they will be unloaded explicitly when needed
+        pass
+    
+    @classmethod
+    def unload_model(cls, model_type, **kwargs):
+        """Explicitly unload a model from memory"""
+        model_key = f"{model_type}_{kwargs.get('model_size', kwargs.get('model_name', ''))}"
+        
+        if model_key in cls._loaded_models:
+            print(f"  🗑️ Unloading {model_type} model...")
+            model = cls._loaded_models.pop(model_key)
+            
+            # Delete model reference
+            del model
+            
+            # Clear memory
+            clear_memory(clear_cuda=True, aggressive=True)
+            print(f"     Model unloaded and memory cleared")
+    
+    @classmethod
+    def unload_all(cls):
+        """Unload all cached models"""
+        print(f"  🗑️ Unloading all cached models ({len(cls._loaded_models)} models)...")
+        cls._loaded_models.clear()
+        clear_memory(clear_cuda=True, aggressive=True)
+        print(f"     All models unloaded")
+
+
+def process_long_video_in_chunks(video_path, process_func, chunk_duration=60, 
+                                  overlap=2, progress_callback=None, **kwargs):
+    """Process a long video in chunks to manage memory
+    
+    Args:
+        video_path: Path to video file
+        process_func: Function to call for each chunk (chunk_path, start_time, **kwargs)
+        chunk_duration: Duration of each chunk in seconds
+        overlap: Overlap between chunks in seconds (for continuity)
+        progress_callback: Optional callback for progress updates
+        **kwargs: Additional arguments passed to process_func
+    
+    Returns:
+        Combined results from all chunks
+    """
+    video_info = get_video_info(video_path)
+    if not video_info:
+        print("❌ Could not get video info")
+        return None
+    
+    total_duration = video_info["duration"]
+    
+    if total_duration <= chunk_duration:
+        # Video is short enough, process directly
+        return process_func(video_path, 0, **kwargs)
+    
+    print(f"\n📹 Processing long video in chunks")
+    print(f"   Duration: {total_duration:.1f}s")
+    print(f"   Chunk size: {chunk_duration}s")
+    print(f"   Expected chunks: {int(total_duration / chunk_duration) + 1}")
+    print()
+    
+    results = []
+    start_time = 0
+    chunk_num = 0
+    total_chunks = int(total_duration / chunk_duration) + 1
+    
+    temp_dir = tempfile.mkdtemp(prefix="videogen_chunks_")
+    
+    try:
+        while start_time < total_duration:
+            chunk_num += 1
+            actual_duration = min(chunk_duration, total_duration - start_time)
+            
+            if progress_callback:
+                progress_callback(chunk_num, total_chunks, start_time, actual_duration)
+            
+            print(f"  📦 Processing chunk {chunk_num}/{total_chunks} ({start_time:.1f}s - {start_time + actual_duration:.1f}s)")
+            
+            # Extract audio chunk
+            chunk_audio = os.path.join(temp_dir, f"chunk_{chunk_num}.wav")
+            if not extract_audio_chunk(video_path, start_time, actual_duration, chunk_audio):
+                print(f"     ⚠️ Failed to extract chunk, skipping")
+                start_time += chunk_duration - overlap
+                continue
+            
+            # Process chunk
+            try:
+                chunk_result = process_func(chunk_audio, start_time, **kwargs)
+                if chunk_result:
+                    results.append(chunk_result)
+                
+                # Clear memory after each chunk
+                clear_memory(clear_cuda=True)
+                
+            except Exception as e:
+                print(f"     ⚠️ Chunk processing failed: {e}")
+            
+            # Clean up chunk file
+            if os.path.exists(chunk_audio):
+                os.remove(chunk_audio)
+            
+            start_time += chunk_duration - overlap
+        
+        return results
+        
+    finally:
+        # Clean up temp directory
+        import shutil
+        if os.path.exists(temp_dir):
+            shutil.rmtree(temp_dir, ignore_errors=True)
+
 # NSFW text classification
 TRANSFORMERS_AVAILABLE = False
 NSFW_CLASSIFIER = None
@@ -5473,13 +5858,14 @@ TRANSLATION_LANGUAGES = {
 }


-def transcribe_video_audio(video_path, model_size="base", language=None):
-    """Transcribe audio from video using Whisper
+def transcribe_video_audio(video_path, model_size="base", language=None, auto_chunk=True):
+    """Transcribe audio from video using Whisper with memory management
    
    Args:
        video_path: Path to the video file
        model_size: Whisper model size (tiny, base, small, medium, large)
        language: Source language code (optional, auto-detected if not provided)
+        auto_chunk: Automatically chunk long videos (default: True)
    
    Returns:
        List of segments with text, start, end times
@@ -5491,17 +5877,53 @@ def transcribe_video_audio(video_path, model_size="base", language=None):
    print(f"🎤 Transcribing audio from: {video_path}")
    print(f"   Model: {model_size}")
    
+    # Get video info for memory management
+    video_info = get_video_info(video_path)
+    vram_gb = detect_vram_gb() if torch.cuda.is_available() else 8
+    
+    # Determine if chunking is needed
+    should_chunk = False
+    chunk_duration = 300  # Default 5 minutes
+    
+    if auto_chunk and video_info:
+        duration = video_info["duration"]
+        resolution = video_info["resolution"]
+        should_chunk, chunk_duration, reason = should_chunk_video(duration, resolution, vram_gb)
+        print(f"   {reason}")
+    
    try:
-        # Load Whisper model
+        # Check memory before loading model
+        vram_ok, ram_ok, mem_info = check_memory_available(required_vram_gb=2.0, required_ram_gb=2.0)
+        print(f"   Memory: VRAM {mem_info['vram_used_gb']:.1f}/{mem_info['vram_total_gb']:.1f}GB, RAM {mem_info['ram_percent']:.0f}%")
+        
+        if not vram_ok:
+            print(f"   ⚠️ Low VRAM, will use aggressive memory management")
+        
+        # Load Whisper model with memory tracking
+        mem_before = get_memory_usage()
        model = whisper.load_model(model_size)
+        mem_after = get_memory_usage()
        
-        # Transcribe
+        vram_used = mem_after["vram_used_gb"] - mem_before["vram_used_gb"]
+        print(f"   Model loaded (VRAM: +{vram_used:.2f}GB)")
+        
+        if should_chunk and video_info:
+            # Process in chunks for long videos
+            result = _transcribe_chunked(video_path, model, video_info, chunk_duration, language)
+        else:
+            # Process entire video at once
            transcribe_options = {}
            if language:
                transcribe_options["language"] = language
            
            result = model.transcribe(video_path, **transcribe_options)
        
+        # Unload model to free memory
+        del model
+        clear_memory(clear_cuda=True)
+        print(f"   Model unloaded, memory cleared")
+        
+        # Process segments
        segments = []
        for seg in result["segments"]:
            segments.append({
@@ -5520,9 +5942,137 @@ def transcribe_video_audio(video_path, model_size="base", language=None):
        }
    except Exception as e:
        print(f"❌ Transcription failed: {e}")
+        # Clear memory on failure
+        clear_memory(clear_cuda=True, aggressive=True)
        return None


+def _transcribe_chunked(video_path, model, video_info, chunk_duration, language=None):
+    """Internal function to transcribe long videos in chunks
+    
+    Args:
+        video_path: Path to video file
+        model: Loaded Whisper model
+        video_info: Video information dict
+        chunk_duration: Duration of each chunk in seconds
+        language: Optional language code
+    
+    Returns:
+        Combined transcription result
+    """
+    total_duration = video_info["duration"]
+    overlap = 5  # 5 second overlap for continuity
+    
+    print(f"\n   📦 Processing in chunks ({chunk_duration}s each)")
+    
+    all_segments = []
+    all_text = []
+    detected_language = None
+    
+    start_time = 0
+    chunk_num = 0
+    total_chunks = int(total_duration / chunk_duration) + 1
+    
+    temp_dir = tempfile.mkdtemp(prefix="whisper_chunks_")
+    
+    try:
+        while start_time < total_duration:
+            chunk_num += 1
+            actual_duration = min(chunk_duration, total_duration - start_time)
+            
+            print(f"      Chunk {chunk_num}/{total_chunks} ({start_time:.1f}s - {start_time + actual_duration:.1f}s)")
+            
+            # Extract audio chunk
+            chunk_audio = os.path.join(temp_dir, f"chunk_{chunk_num}.wav")
+            if not extract_audio_chunk(video_path, start_time, actual_duration, chunk_audio):
+                start_time += chunk_duration - overlap
+                continue
+            
+            # Transcribe chunk
+            transcribe_options = {}
+            if language:
+                transcribe_options["language"] = language
+            
+            try:
+                chunk_result = model.transcribe(chunk_audio, **transcribe_options)
+                
+                # Adjust timestamps to global time
+                for seg in chunk_result.get("segments", []):
+                    adjusted_seg = {
+                        "text": seg["text"].strip(),
+                        "start": seg["start"] + start_time,
+                        "end": seg["end"] + start_time,
+                    }
+                    all_segments.append(adjusted_seg)
+                
+                all_text.append(chunk_result.get("text", ""))
+                
+                if not detected_language:
+                    detected_language = chunk_result.get("language")
+                
+            except Exception as e:
+                print(f"         ⚠️ Chunk failed: {e}")
+            
+            # Clean up chunk file
+            if os.path.exists(chunk_audio):
+                os.remove(chunk_audio)
+            
+            # Clear memory periodically
+            if chunk_num % 3 == 0:
+                clear_memory(clear_cuda=False)  # Don't clear CUDA, model still needed
+            
+            start_time += chunk_duration - overlap
+        
+        # Merge overlapping segments
+        merged_segments = _merge_overlapping_segments(all_segments)
+        
+        return {
+            "segments": merged_segments,
+            "text": " ".join(all_text),
+            "language": detected_language or "unknown",
+        }
+        
+    finally:
+        # Clean up temp directory
+        import shutil
+        if os.path.exists(temp_dir):
+            shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def _merge_overlapping_segments(segments, max_gap=1.0):
+    """Merge overlapping or adjacent segments
+    
+    Args:
+        segments: List of segments with start, end, text
+        max_gap: Maximum gap to merge (in seconds)
+    
+    Returns:
+        Merged segments list
+    """
+    if not segments:
+        return []
+    
+    # Sort by start time
+    sorted_segments = sorted(segments, key=lambda x: x["start"])
+    
+    merged = [sorted_segments[0]]
+    
+    for seg in sorted_segments[1:]:
+        last = merged[-1]
+        
+        # Check if segments overlap or are adjacent
+        if seg["start"] <= last["end"] + max_gap:
+            # Merge: extend end time if later, combine text
+            last["end"] = max(last["end"], seg["end"])
+            # Avoid duplicating text
+            if seg["text"] not in last["text"]:
+                last["text"] = last["text"] + " " + seg["text"]
+        else:
+            merged.append(seg)
+    
+    return merged
+
+
 def translate_text(text, source_lang, target_lang):
    """Translate text using MarianMT models
    
@@ -7003,6 +7553,103 @@ def main(args):
                                        if debug:
                                            print(f"   [DEBUG] Subdirectory load failed: {sub_e}")
                                        continue
+                            
+                            # Strategy 3b: Check if this is a component-only model (fine-tuned weights only)
+                            if not pipeline_loaded_successfully and 'config.json' in files:
+                                try:
+                                    from huggingface_hub import hf_hub_download
+                                    import json as json_module
+                                    
+                                    # Download and read config.json
+                                    config_path = hf_hub_download(
+                                        model_id_to_load,
+                                        "config.json",
+                                        token=hf_token
+                                    )
+                                    with open(config_path, 'r') as cf:
+                                        model_config = json_module.load(cf)
+                                    
+                                    class_name = model_config.get("_class_name", "")
+                                    model_type = model_config.get("model_type", "")
+                                    arch_type = model_config.get("architectures", [])
+                                    
+                                    if debug:
+                                        print(f"   [DEBUG] Config class_name: {class_name}")
+                                        print(f"   [DEBUG] Config model_type: {model_type}")
+                                        print(f"   [DEBUG] Config architectures: {arch_type}")
+                                    
+                                    # Detect component type
+                                    is_component = False
+                                    component_class = None
+                                    
+                                    # Check explicit class name
+                                    component_classes = [
+                                        "LTXVideoTransformer3DModel",
+                                        "UNet2DConditionModel",
+                                        "UNet3DConditionModel",
+                                        "AutoencoderKL",
+                                        "AutoencoderKLLTXVideo",
+                                    ]
+                                    
+                                    if class_name in component_classes:
+                                        is_component = True
+                                        component_class = class_name
+                                    elif model_type in ["ltx_video", "ltxvideo"]:
+                                        is_component = True
+                                        component_class = "LTXVideoTransformer3DModel"
+                                    elif any("LTX" in str(a) for a in arch_type):
+                                        is_component = True
+                                        component_class = "LTXVideoTransformer3DModel"
+                                    elif "ltx" in model_id_to_load.lower() and any(k in model_config for k in ["num_layers", "hidden_size"]):
+                                        is_component = True
+                                        component_class = "LTXVideoTransformer3DModel"
+                                    
+                                    if is_component:
+                                        print(f"   📦 Detected component-only model: {component_class}")
+                                        print(f"   This is a fine-tuned component, loading base model first...")
+                                        
+                                        # Determine base model
+                                        base_model = None
+                                        model_id_lower = model_id_to_load.lower()
+                                        
+                                        if "ltx" in model_id_lower or "ltxvideo" in model_id_lower:
+                                            base_model = "Lightricks/LTX-Video"
+                                        elif "wan" in model_id_lower:
+                                            base_model = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
+                                        elif "svd" in model_id_lower:
+                                            base_model = "stabilityai/stable-video-diffusion-img2vid-xt-1-1"
+                                        
+                                        if base_model:
+                                            print(f"   Loading base pipeline: {base_model}")
+                                            pipe = PipelineClass.from_pretrained(base_model, **pipe_kwargs)
+                                            print(f"   ✅ Base pipeline loaded")
+                                            
+                                            # Load the fine-tuned component
+                                            if component_class == "LTXVideoTransformer3DModel":
+                                                from diffusers import LTXVideoTransformer3DModel
+                                                print(f"   Loading fine-tuned transformer...")
+                                                pipe.transformer = LTXVideoTransformer3DModel.from_pretrained(
+                                                    model_id_to_load,
+                                                    torch_dtype=pipe_kwargs.get("torch_dtype", torch.float16),
+                                                    token=hf_token
+                                                )
+                                                print(f"   ✅ Fine-tuned transformer loaded!")
+                                                pipeline_loaded_successfully = True
+                                            elif component_class == "AutoencoderKLLTXVideo":
+                                                from diffusers import AutoencoderKLLTXVideo
+                                                print(f"   Loading fine-tuned VAE...")
+                                                pipe.vae = AutoencoderKLLTXVideo.from_pretrained(
+                                                    model_id_to_load,
+                                                    torch_dtype=pipe_kwargs.get("torch_dtype", torch.float16),
+                                                    token=hf_token
+                                                )
+                                                print(f"   ✅ Fine-tuned VAE loaded!")
+                                                pipeline_loaded_successfully = True
+                                        
+                                except Exception as comp_e:
+                                    if debug:
+                                        print(f"   [DEBUG] Component detection failed: {comp_e}")
+                                    
                    except Exception as api_e:
                        if debug:
                            print(f"   [DEBUG] API check failed: {api_e}")
@@ -7674,8 +8321,14 @@ def main(args):
                                model_config = json_module.load(cf)
                            
                            class_name = model_config.get("_class_name", "")
+                            arch_type = model_config.get("architectures", [])
+                            model_type = model_config.get("model_type", "")
+                            
                            if debug:
                                print(f"   [DEBUG] Model class name: {class_name}")
+                                print(f"   [DEBUG] Architectures: {arch_type}")
+                                print(f"   [DEBUG] Model type: {model_type}")
+                                print(f"   [DEBUG] Config keys: {list(model_config.keys())[:10]}")
                            
                            # Check if this is a component-only model (transformer, unet, etc.)
                            component_classes = [
@@ -7686,7 +8339,27 @@ def main(args):
                                "AutoencoderKLLTXVideo",
                            ]
                            
-                            if class_name in component_classes:
+                            # Also detect by model_type or architecture
+                            is_component = class_name in component_classes
+                            
+                            # If no _class_name, check for other indicators
+                            if not class_name:
+                                # Check model_type for hints
+                                if model_type in ["ltx_video", "ltxvideo"]:
+                                    is_component = True
+                                    class_name = "LTXVideoTransformer3DModel"
+                                # Check architectures
+                                elif any("LTX" in str(a) for a in arch_type):
+                                    is_component = True
+                                    class_name = "LTXVideoTransformer3DModel"
+                                # Check for typical transformer config keys
+                                elif any(k in model_config for k in ["num_layers", "hidden_size", "num_attention_heads"]):
+                                    # This looks like a transformer config
+                                    if "ltx" in model_id_lower:
+                                        is_component = True
+                                        class_name = "LTXVideoTransformer3DModel"
+                            
+                            if is_component:
                                print(f"   📦 Detected component-only model: {class_name}")
                                print(f"   This is a fine-tuned component, not a full pipeline.")
                                

--- a/videogen_mcp_server.py
+++ b/videogen_mcp_server.py
@@ -789,6 +789,160 @@ async def list_tools() -> list:
                "required": ["text", "source_lang", "target_lang"]
            }
        ),
+        
+        # Character Consistency Tools
+        Tool(
+            name="videogen_create_character",
+            description="Create a character profile from reference images for consistent character generation across multiple images/videos.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "name": {
+                        "type": "string",
+                        "description": "Character name (alphanumeric, underscores, hyphens only)"
+                    },
+                    "reference_images": {
+                        "type": "array",
+                        "items": {"type": "string"},
+                        "description": "List of paths to reference images (1-5 images)"
+                    },
+                    "description": {
+                        "type": "string",
+                        "description": "Optional description of the character"
+                    }
+                },
+                "required": ["name", "reference_images"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_list_characters",
+            description="List all saved character profiles.",
+            inputSchema={
+                "type": "object",
+                "properties": {}
+            }
+        ),
+        
+        Tool(
+            name="videogen_show_character",
+            description="Show details of a specific character profile.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "name": {
+                        "type": "string",
+                        "description": "Character profile name"
+                    }
+                },
+                "required": ["name"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_delete_character",
+            description="Delete a character profile.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "name": {
+                        "type": "string",
+                        "description": "Character profile name to delete"
+                    }
+                },
+                "required": ["name"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_generate_with_character",
+            description="Generate an image or video with a specific character using IP-Adapter and/or InstantID for consistency.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "prompt": {
+                        "type": "string",
+                        "description": "Description of what to generate with the character"
+                    },
+                    "character": {
+                        "type": "string",
+                        "description": "Character profile name to use"
+                    },
+                    "model": {
+                        "type": "string",
+                        "description": "Model name (e.g., flux_dev, sdxl_base)"
+                    },
+                    "output": {
+                        "type": "string",
+                        "default": "output"
+                    },
+                    "use_ipadapter": {
+                        "type": "boolean",
+                        "description": "Use IP-Adapter for character consistency",
+                        "default": True
+                    },
+                    "use_instantid": {
+                        "type": "boolean",
+                        "description": "Use InstantID for face identity preservation",
+                        "default": False
+                    },
+                    "ipadapter_scale": {
+                        "type": "number",
+                        "description": "IP-Adapter influence scale (0.0-1.0)",
+                        "default": 0.8
+                    },
+                    "instantid_scale": {
+                        "type": "number",
+                        "description": "InstantID influence scale (0.0-1.0)",
+                        "default": 0.8
+                    },
+                    "animate": {
+                        "type": "boolean",
+                        "description": "Generate video instead of image (I2V)",
+                        "default": False
+                    }
+                },
+                "required": ["prompt", "character", "model"]
+            }
+        ),
+        
+        Tool(
+            name="videogen_generate_with_reference",
+            description="Generate an image using reference images directly (without creating a character profile).",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "prompt": {
+                        "type": "string",
+                        "description": "Description of what to generate"
+                    },
+                    "reference_images": {
+                        "type": "array",
+                        "items": {"type": "string"},
+                        "description": "List of paths to reference images"
+                    },
+                    "model": {
+                        "type": "string",
+                        "description": "Model name (e.g., flux_dev, sdxl_base)"
+                    },
+                    "output": {
+                        "type": "string",
+                        "default": "output"
+                    },
+                    "ipadapter_scale": {
+                        "type": "number",
+                        "description": "IP-Adapter influence scale (0.0-1.0)",
+                        "default": 0.8
+                    },
+                    "use_instantid": {
+                        "type": "boolean",
+                        "description": "Use InstantID for face identity",
+                        "default": False
+                    }
+                },
+                "required": ["prompt", "reference_images", "model"]
+            }
+        ),
    ]


@@ -1123,6 +1277,84 @@ async def call_tool(name: str, arguments: dict) -> list:
        output, code = run_videogen_command(args)
        return [TextContent(type="text", text=output)]
    
+    # Character Consistency Tools
+    elif name == "videogen_create_character":
+        args = [
+            "--create-character", arguments["name"],
+        ]
+        # Add reference images
+        for img in arguments["reference_images"][:5]:  # Max 5 images
+            args.extend(["--character-images", img])
+        if arguments.get("description"):
+            args.extend(["--character-desc", arguments["description"]])
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_list_characters":
+        args = ["--list-characters"]
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_show_character":
+        args = ["--show-character", arguments["name"]]
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_delete_character":
+        args = ["--delete-character", arguments["name"]]
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_generate_with_character":
+        args = [
+            "--model", arguments["model"],
+            "--character", arguments["character"],
+            "--prompt", arguments["prompt"],
+            "--output", arguments.get("output", "output"),
+        ]
+        
+        # IP-Adapter options
+        if arguments.get("use_ipadapter", True):
+            args.append("--ipadapter")
+            if arguments.get("ipadapter_scale"):
+                args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
+        
+        # InstantID options
+        if arguments.get("use_instantid", False):
+            args.append("--instantid")
+            if arguments.get("instantid_scale"):
+                args.extend(["--instantid-scale", str(arguments["instantid_scale"])])
+        
+        # Animate for I2V
+        if arguments.get("animate", False):
+            args.append("--image_to_video")
+        
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
+    elif name == "videogen_generate_with_reference":
+        args = [
+            "--model", arguments["model"],
+            "--prompt", arguments["prompt"],
+            "--output", arguments.get("output", "output"),
+        ]
+        
+        # Add reference images
+        for img in arguments["reference_images"]:
+            args.extend(["--reference-images", img])
+        
+        # IP-Adapter options
+        args.append("--ipadapter")
+        if arguments.get("ipadapter_scale"):
+            args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
+        
+        # InstantID options
+        if arguments.get("use_instantid", False):
+            args.append("--instantid")
+        
+        output, code = run_videogen_command(args)
+        return [TextContent(type="text", text=output)]
+    
    else:
        return [TextContent(type="text", text=f"Unknown tool: {name}")]


--- a/videogen_models.json
+++ b/videogen_models.json
-{
+i {
  "models": {
    "wan_1.3b_i2v": {
      "id": "Wan-AI/Wan2.1-I2V-1.3B-Diffusers",

--- a/webapp.py
+++ b/webapp.py
@@ -614,6 +614,136 @@ def delete_output(filename):
        return jsonify({'success': True})
    return jsonify({'error': 'File not found'}), 404

+# Character Profile API endpoints
+CHARACTERS_DIR = Path.home() / ".config" / "videogen" / "characters"
+
+@app.route('/api/characters', methods=['GET'])
+def api_list_characters():
+    """List all character profiles"""
+    characters = []
+    if CHARACTERS_DIR.exists():
+        for profile_file in CHARACTERS_DIR.glob("*.json"):
+            try:
+                with open(profile_file, 'r') as f:
+                    profile = json.load(f)
+                    characters.append({
+                        'name': profile.get('name', profile_file.stem),
+                        'description': profile.get('description', ''),
+                        'image_count': len(profile.get('reference_images', [])),
+                        'created': profile.get('created', ''),
+                        'tags': profile.get('tags', [])
+                    })
+            except Exception as e:
+                print(f"Error loading character profile {profile_file}: {e}")
+    
+    return jsonify(characters)
+
+@app.route('/api/characters/<name>', methods=['GET'])
+def api_get_character(name):
+    """Get a specific character profile"""
+    profile_path = CHARACTERS_DIR / f"{name}.json"
+    if profile_path.exists():
+        try:
+            with open(profile_path, 'r') as f:
+                return jsonify(json.load(f))
+        except Exception as e:
+            return jsonify({'error': str(e)}), 500
+    return jsonify({'error': 'Character not found'}), 404
+
+@app.route('/api/characters', methods=['POST'])
+def api_create_character():
+    """Create a new character profile"""
+    name = request.form.get('name')
+    description = request.form.get('description', '')
+    
+    if not name:
+        return jsonify({'error': 'Name is required'}), 400
+    
+    # Sanitize name
+    name = re.sub(r'[^a-zA-Z0-9_-]', '_', name)
+    
+    # Handle uploaded images
+    images = request.files.getlist('images')
+    if not images or len(images) == 0:
+        return jsonify({'error': 'At least one reference image is required'}), 400
+    
+    # Create character directory
+    CHARACTERS_DIR.mkdir(parents=True, exist_ok=True)
+    char_image_dir = CHARACTERS_DIR / name
+    char_image_dir.mkdir(parents=True, exist_ok=True)
+    
+    # Save images
+    saved_images = []
+    for i, img in enumerate(images[:5]):  # Max 5 images
+        if img and img.filename:
+            ext = img.filename.rsplit('.', 1)[-1].lower()
+            if ext in ALLOWED_EXTENSIONS['image']:
+                filename = f"reference_{i+1}.{ext}"
+                filepath = char_image_dir / filename
+                img.save(filepath)
+                saved_images.append(str(filepath))
+    
+    if not saved_images:
+        return jsonify({'error': 'No valid images uploaded'}), 400
+    
+    # Create profile
+    profile = {
+        'name': name,
+        'description': description,
+        'reference_images': saved_images,
+        'created': datetime.now().isoformat(),
+        'tags': []
+    }
+    
+    # Save profile
+    profile_path = CHARACTERS_DIR / f"{name}.json"
+    with open(profile_path, 'w') as f:
+        json.dump(profile, f, indent=2)
+    
+    return jsonify(profile)
+
+@app.route('/api/characters/<name>', methods=['DELETE'])
+def api_delete_character(name):
+    """Delete a character profile"""
+    profile_path = CHARACTERS_DIR / f"{name}.json"
+    char_image_dir = CHARACTERS_DIR / name
+    
+    if not profile_path.exists():
+        return jsonify({'error': 'Character not found'}), 404
+    
+    try:
+        # Delete profile file
+        profile_path.unlink()
+        
+        # Delete images directory
+        if char_image_dir.exists():
+            shutil.rmtree(char_image_dir)
+        
+        return jsonify({'success': True})
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/api/upload-multiple', methods=['POST'])
+def upload_multiple_files():
+    """Upload multiple files (for reference images)"""
+    files = request.files.getlist('files')
+    upload_type = request.form.get('type', 'general')
+    
+    if not files:
+        return jsonify({'error': 'No files provided'}), 400
+    
+    saved_paths = []
+    for f in files:
+        if f and f.filename:
+            ext = f.filename.rsplit('.', 1)[-1].lower()
+            if ext in ALLOWED_EXTENSIONS['image']:
+                filename = f"{uuid.uuid4().hex[:8]}_{secure_filename(f.filename)}"
+                filepath = UPLOAD_FOLDER / filename
+                f.save(filepath)
+                saved_paths.append(str(filepath))
+    
+    return jsonify({'paths': saved_paths})
+
 # WebSocket events
 @socketio.on('connect')
 def handle_connect():