Add character consistency features, fix model loading for non-diffusers models

- Add character profile management (create, list, show, delete)
- Add IP-Adapter and InstantID support for character consistency
- Fix model loading for models with config.json only (no model_index.json)
- Add component-only model detection (fine-tuned weights)
- Update MCP server with character consistency tools
- Update SKILL.md and README.md documentation
- Add memory management for dubbing/translation
- Add chunked processing for Whisper transcription
- Add character persistency options to web interface
parent 627eb38f
......@@ -47,6 +47,12 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
- **Large Models** (30-50GB VRAM): Allegro, HunyuanVideo
- **Huge Models** (50GB+ VRAM): Open-Sora, Step-Video, Lumina
### Character Consistency
- **Character Profiles**: Save and reuse character references across generations
- **IP-Adapter**: Image prompt adapter for consistent character generation
- **InstantID**: Face identity preservation for consistent faces
- **Reference Images**: Use multiple reference images for character consistency
### Smart Features
- **Auto Mode**: Automatic model selection and configuration
- **NSFW Detection**: Automatic content classification
......@@ -54,6 +60,7 @@ A comprehensive, GPU-accelerated video generation toolkit supporting Text-to-Vid
- **Time Estimation**: Hardware-aware generation time prediction
- **Multi-GPU**: Distributed generation across multiple GPUs
- **Auto-Disable**: Models that fail 3 times are auto-disabled
- **Memory Management**: Automatic chunking for long videos and low VRAM
### User Interfaces
- **Command Line**: Full-featured CLI with all options
......@@ -147,6 +154,40 @@ python3 videogen --image_to_video --model svd_xt_1.1 \
--lip_sync --output speaker
```
### Character Consistency
VideoGen supports character consistency across multiple generations using IP-Adapter and InstantID.
```bash
# Create a character profile from reference images
python3 videogen --create-character my_character \
--character-images ref1.jpg ref2.jpg ref3.jpg \
--character-desc "A young woman with red hair"
# List saved character profiles
python3 videogen --list-characters
# Generate with character consistency
python3 videogen --model flux_dev \
--character my_character \
--prompt "my_character walking in a park" \
--output character_park
# Use IP-Adapter directly with reference images
python3 videogen --model sdxl_base \
--ipadapter --ipadapter-scale 0.8 \
--reference-images ref1.jpg ref2.jpg \
--prompt "a person reading a book" \
--output reading
# Use InstantID for face consistency
python3 videogen --model sdxl_base \
--ipadapter --instantid \
--reference-images face_ref.jpg \
--prompt "portrait of a person smiling" \
--output portrait
```
---
## AI Agent Integration
......
......@@ -341,6 +341,74 @@ python3 videogen --video input.mp4 --dub-video --target-lang de --tts_voice edge
---
## Character Consistency
VideoGen supports character consistency across multiple generations using IP-Adapter, InstantID, and Character Profiles.
### Create Character Profile
```bash
# Create a character profile from reference images
python3 videogen --create-character alice \
--character-images ref1.jpg ref2.jpg ref3.jpg \
--character-desc "young woman with blue eyes and blonde hair"
# List all saved character profiles
python3 videogen --list-characters
# Show details of a character profile
python3 videogen --show-character alice
# Delete a character profile
python3 videogen --delete-character alice
```
### Generate with Character
```bash
# Generate image with character consistency
python3 videogen --model flux_dev \
--character alice \
--prompt "alice walking in a park" \
--output alice_park.png
# Generate video with character (I2V)
python3 videogen --image_to_video --model svd_xt_1.1 \
--image_model flux_dev \
--character alice \
--prompt "alice smiling at camera" \
--prompt_animation "subtle head movement" \
--output alice_animated
```
### IP-Adapter Direct Usage
```bash
# Use IP-Adapter with reference images directly
python3 videogen --model flux_dev \
--ipadapter --ipadapter-scale 0.8 \
--reference-images ref1.jpg ref2.jpg \
--prompt "the person in a business suit" \
--output business.png
# Use InstantID for face identity
python3 videogen --model flux_dev \
--ipadapter --instantid \
--reference-images face_ref.jpg \
--prompt "portrait of the person smiling" \
--output portrait.png
```
### Character Consistency Tips
1. **Use multiple reference images** (3-5) for better consistency
2. **IP-Adapter scale**: 0.7-0.9 for good balance (higher = more similar)
3. **InstantID** is better for face identity preservation
4. **Character profiles** are reusable across sessions
5. **Combine IP-Adapter + InstantID** for best results
---
## Output Files
VideoGen creates these output files:
......
......@@ -934,3 +934,138 @@ body {
.fa-spin {
animation: spin 1s linear infinite;
}
/* Character Consistency Styles */
/* File Upload Box */
.file-upload {
position: relative;
border: 2px dashed var(--border-color);
border-radius: var(--border-radius);
padding: 2rem;
text-align: center;
transition: all 0.3s;
cursor: pointer;
}
.file-upload:hover {
border-color: var(--primary);
background: rgba(99, 102, 241, 0.05);
}
.file-upload input[type="file"] {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
opacity: 0;
cursor: pointer;
}
.file-upload .file-label {
display: flex;
flex-direction: column;
align-items: center;
gap: 0.5rem;
pointer-events: none;
}
.file-upload .file-label i {
font-size: 2rem;
color: var(--primary);
}
.file-upload .file-label span {
font-weight: 500;
}
.file-upload .file-label small {
color: var(--text-muted);
font-size: 0.8rem;
}
/* Image Preview Grid */
.image-preview-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(100px, 1fr));
gap: 0.75rem;
margin-top: 1rem;
}
.preview-item {
position: relative;
aspect-ratio: 1;
border-radius: var(--border-radius);
overflow: hidden;
background: var(--bg-darker);
}
.preview-item img {
width: 100%;
height: 100%;
object-fit: cover;
}
.preview-item .remove-btn {
position: absolute;
top: 4px;
right: 4px;
width: 24px;
height: 24px;
border-radius: 50%;
background: rgba(0, 0, 0, 0.7);
color: white;
border: none;
cursor: pointer;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.7rem;
opacity: 0;
transition: opacity 0.2s;
}
.preview-item:hover .remove-btn {
opacity: 1;
}
.preview-item .remove-btn:hover {
background: var(--danger);
}
/* Character Section */
#character-section {
border-left: 3px solid var(--primary);
}
#character-section h3 {
color: var(--primary-light);
}
/* IP-Adapter and InstantID Options */
#ipadapter-options,
#instantid-options,
#character-profile-options {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid var(--border-color);
}
/* Range Slider with Value Display */
.form-group input[type="range"] {
width: calc(100% - 50px);
vertical-align: middle;
}
.form-group input[type="range"] + span {
display: inline-block;
width: 40px;
text-align: right;
font-weight: 600;
color: var(--primary);
}
/* Hidden class */
.hidden {
display: none !important;
}
......@@ -310,6 +310,15 @@ async function handleGenerate(e) {
params.translate_subtitles = form.querySelector('#translate_subtitles')?.checked || false;
params.burn_subtitles = form.querySelector('#burn_subtitles')?.checked || false;
// Character consistency options
params.use_character = form.querySelector('#use_character')?.checked || false;
params.use_ipadapter = form.querySelector('#use_ipadapter')?.checked || false;
params.use_instantid = form.querySelector('#use_instantid')?.checked || false;
params.ipadapter_scale = parseFloat(form.querySelector('#ipadapter_scale')?.value) || 0.8;
params.instantid_scale = parseFloat(form.querySelector('#instantid_scale')?.value) || 0.8;
params.ipadapter_type = form.querySelector('#ipadapter_type')?.value || 'plus_sd15';
params.character_profile = form.querySelector('#character_profile')?.value || '';
// Convert numeric values
params.width = parseInt(params.width) || 832;
params.height = parseInt(params.height) || 480;
......@@ -794,3 +803,291 @@ document.addEventListener('keydown', (e) => {
document.getElementById('generate-form').dispatchEvent(new Event('submit'));
}
});
// Character Consistency Functions
// Toggle character profile options
function toggleCharacterOptions() {
const checkbox = document.getElementById('use_character');
const options = document.getElementById('character-profile-options');
if (checkbox.checked) {
options.classList.remove('hidden');
loadCharacterProfiles();
} else {
options.classList.add('hidden');
}
}
// Toggle IP-Adapter options
function toggleIPAdapterOptions() {
const checkbox = document.getElementById('use_ipadapter');
const options = document.getElementById('ipadapter-options');
const instantidOptions = document.getElementById('instantid-options');
const instantidCheckbox = document.getElementById('use_instantid');
if (checkbox.checked) {
options.classList.remove('hidden');
if (instantidCheckbox.checked) {
instantidOptions.classList.remove('hidden');
}
} else {
options.classList.add('hidden');
// Also hide InstantID options if IP-Adapter is disabled
instantidOptions.classList.add('hidden');
instantidCheckbox.checked = false;
}
}
// Toggle InstantID options
function toggleInstantIDOptions() {
const checkbox = document.getElementById('use_instantid');
const options = document.getElementById('instantid-options');
const ipadapterCheckbox = document.getElementById('use_ipadapter');
if (checkbox.checked) {
// Require IP-Adapter for InstantID
if (!ipadapterCheckbox.checked) {
ipadapterCheckbox.checked = true;
toggleIPAdapterOptions();
}
options.classList.remove('hidden');
} else {
options.classList.add('hidden');
}
}
// Load character profiles from API
async function loadCharacterProfiles() {
try {
const response = await fetch('/api/characters');
const characters = await response.json();
const select = document.getElementById('character_profile');
select.innerHTML = '<option value="">Select a character...</option>';
characters.forEach(char => {
const option = document.createElement('option');
option.value = char.name;
option.textContent = `${char.name} (${char.image_count} images)`;
option.dataset.description = char.description || '';
select.appendChild(option);
});
} catch (error) {
console.error('Error loading characters:', error);
showToast('Failed to load character profiles', 'error');
}
}
// Handle reference image upload
async function handleReferenceUpload(input) {
const files = input.files;
if (!files || files.length === 0) return;
const preview = document.getElementById('reference-preview');
preview.innerHTML = '';
// Show preview of selected images
for (let i = 0; i < Math.min(files.length, 5); i++) {
const file = files[i];
const reader = new FileReader();
reader.onload = function(e) {
const div = document.createElement('div');
div.className = 'preview-item';
div.innerHTML = `
<img src="${e.target.result}" alt="Reference ${i + 1}">
<button type="button" class="remove-btn" onclick="removeReferenceImage(this)">
<i class="fas fa-times"></i>
</button>
`;
preview.appendChild(div);
};
reader.readAsDataURL(file);
}
// Upload files to server
const formData = new FormData();
for (let i = 0; i < files.length; i++) {
formData.append('files', files[i]);
}
formData.append('type', 'reference');
try {
const response = await fetch('/api/upload-multiple', {
method: 'POST',
body: formData
});
const data = await response.json();
if (response.ok) {
// Store paths in hidden input
const hiddenInput = document.getElementById('input_reference');
if (hiddenInput) {
hiddenInput.value = JSON.stringify(data.paths);
}
showToast(`Uploaded ${files.length} reference images`, 'success');
} else {
showToast(`Upload failed: ${data.error}`, 'error');
}
} catch (error) {
console.error('Upload error:', error);
showToast('Upload failed', 'error');
}
}
// Remove reference image from preview
function removeReferenceImage(btn) {
const item = btn.closest('.preview-item');
item.remove();
// Update hidden input
const preview = document.getElementById('reference-preview');
const remaining = preview.querySelectorAll('.preview-item');
if (remaining.length === 0) {
const hiddenInput = document.getElementById('input_reference');
if (hiddenInput) {
hiddenInput.value = '';
}
}
}
// Show create character modal
function showCreateCharacterModal() {
const modal = document.getElementById('create-character-modal');
if (modal) {
modal.classList.add('active');
} else {
// Create modal dynamically
const modalHtml = `
<div class="modal active" id="create-character-modal">
<div class="modal-content">
<div class="modal-header">
<h3><i class="fas fa-user-plus"></i> Create Character Profile</h3>
<button class="close-btn" onclick="closeCreateCharacterModal()">
<i class="fas fa-times"></i>
</button>
</div>
<div class="modal-body">
<form id="create-character-form" onsubmit="createCharacterProfile(event)">
<div class="form-group">
<label for="char_name">Character Name</label>
<input type="text" id="char_name" name="name" required placeholder="e.g., my_character">
</div>
<div class="form-group">
<label for="char_desc">Description (optional)</label>
<textarea id="char_desc" name="description" rows="2" placeholder="Brief description of the character..."></textarea>
</div>
<div class="form-group">
<label>Reference Images</label>
<div class="file-upload">
<input type="file" id="char_images" name="images" accept="image/*" multiple required>
<label for="char_images" class="file-label">
<i class="fas fa-images"></i>
<span>Upload Reference Images</span>
<small>Select 1-5 images</small>
</label>
</div>
<div id="char-image-preview" class="image-preview-grid"></div>
</div>
<div class="form-actions">
<button type="submit" class="btn btn-primary">
<i class="fas fa-save"></i> Create Profile
</button>
<button type="button" class="btn btn-secondary" onclick="closeCreateCharacterModal()">
Cancel
</button>
</div>
</form>
</div>
</div>
</div>
`;
document.body.insertAdjacentHTML('beforeend', modalHtml);
// Add preview handler
document.getElementById('char_images').addEventListener('change', function() {
const preview = document.getElementById('char-image-preview');
preview.innerHTML = '';
for (let i = 0; i < Math.min(this.files.length, 5); i++) {
const file = this.files[i];
const reader = new FileReader();
reader.onload = function(e) {
const div = document.createElement('div');
div.className = 'preview-item';
div.innerHTML = `<img src="${e.target.result}" alt="Preview ${i + 1}">`;
preview.appendChild(div);
};
reader.readAsDataURL(file);
}
});
}
}
// Close create character modal
function closeCreateCharacterModal() {
const modal = document.getElementById('create-character-modal');
if (modal) {
modal.classList.remove('active');
modal.remove();
}
}
// Create character profile
async function createCharacterProfile(e) {
e.preventDefault();
const form = e.target;
const formData = new FormData(form);
try {
const response = await fetch('/api/characters', {
method: 'POST',
body: formData
});
const data = await response.json();
if (response.ok) {
showToast(`Character profile "${data.name}" created`, 'success');
closeCreateCharacterModal();
loadCharacterProfiles();
} else {
showToast(`Failed to create profile: ${data.error}`, 'error');
}
} catch (error) {
console.error('Error creating character:', error);
showToast('Failed to create character profile', 'error');
}
}
// Setup slider value displays
document.addEventListener('DOMContentLoaded', () => {
// IP-Adapter scale slider
const ipadapterSlider = document.getElementById('ipadapter_scale');
if (ipadapterSlider) {
ipadapterSlider.addEventListener('input', (e) => {
document.getElementById('ipadapter-scale-value').textContent = e.target.value;
});
}
// InstantID scale slider
const instantidSlider = document.getElementById('instantid_scale');
if (instantidSlider) {
instantidSlider.addEventListener('input', (e) => {
document.getElementById('instantid-scale-value').textContent = e.target.value;
});
}
// InstantID checkbox handler
const instantidCheckbox = document.getElementById('use_instantid');
if (instantidCheckbox) {
instantidCheckbox.addEventListener('change', toggleInstantIDOptions);
}
});
......@@ -327,6 +327,85 @@
</div>
</div>
<!-- Character Consistency -->
<div class="form-section" id="character-section">
<h3><i class="fas fa-user-circle"></i> Character Consistency</h3>
<div class="form-row">
<label class="checkbox-label">
<input type="checkbox" id="use_character" name="use_character" onchange="toggleCharacterOptions()">
<span>Use Character Profile</span>
</label>
<label class="checkbox-label">
<input type="checkbox" id="use_ipadapter" name="use_ipadapter" onchange="toggleIPAdapterOptions()">
<span>IP-Adapter</span>
</label>
<label class="checkbox-label">
<input type="checkbox" id="use_instantid" name="use_instantid">
<span>InstantID (Face)</span>
</label>
</div>
<!-- Character Profile Selection -->
<div id="character-profile-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="character_profile">Character Profile</label>
<select id="character_profile" name="character_profile">
<option value="">Select a character...</option>
</select>
</div>
<div class="form-group">
<button type="button" class="btn btn-secondary" onclick="showCreateCharacterModal()">
<i class="fas fa-plus"></i> New Character
</button>
</div>
</div>
</div>
<!-- IP-Adapter Options -->
<div id="ipadapter-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="ipadapter_scale">IP-Adapter Scale</label>
<input type="range" id="ipadapter_scale" name="ipadapter_scale" value="0.8" min="0.0" max="1.0" step="0.1">
<span id="ipadapter-scale-value">0.8</span>
</div>
<div class="form-group">
<label for="ipadapter_type">IP-Adapter Type</label>
<select id="ipadapter_type" name="ipadapter_type">
<option value="plus_sd15">Plus (SD 1.5)</option>
<option value="plus_sdxl">Plus (SDXL)</option>
<option value="faceid_sd15">FaceID (SD 1.5)</option>
<option value="faceid_sdxl">FaceID (SDXL)</option>
</select>
</div>
</div>
<div class="form-group">
<label>Reference Images for IP-Adapter</label>
<div class="file-upload" id="reference-upload-box">
<input type="file" id="reference_images" name="reference_images" accept="image/*" multiple onchange="handleReferenceUpload(this)">
<label for="reference_images" class="file-label">
<i class="fas fa-images"></i>
<span>Upload Reference Images</span>
<small>Select 1-5 images</small>
</label>
</div>
<div id="reference-preview" class="image-preview-grid"></div>
</div>
</div>
<!-- InstantID Options -->
<div id="instantid-options" class="hidden">
<div class="form-row">
<div class="form-group">
<label for="instantid_scale">InstantID Scale</label>
<input type="range" id="instantid_scale" name="instantid_scale" value="0.8" min="0.0" max="1.0" step="0.1">
<span id="instantid-scale-value">0.8</span>
</div>
</div>
</div>
</div>
<!-- Advanced Options -->
<div class="form-section collapsible">
<h3 onclick="toggleSection(this)">
......
......@@ -137,6 +137,391 @@ try:
except ImportError:
pass
# ──────────────────────────────────────────────────────────────────────────────
# MEMORY MANAGEMENT UTILITIES
# ──────────────────────────────────────────────────────────────────────────────
import gc
def clear_memory(clear_cuda=True, aggressive=False):
"""Clear memory to prevent OOM on long operations
Args:
clear_cuda: Whether to clear CUDA cache
aggressive: If True, also run Python garbage collection multiple times
"""
if clear_cuda and torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
if aggressive:
# Reset peak memory stats
torch.cuda.reset_peak_memory_stats()
# Run garbage collection
gc.collect()
if aggressive:
gc.collect()
gc.collect()
def get_memory_usage():
"""Get current memory usage statistics
Returns:
dict with memory usage info
"""
result = {
"ram_used_gb": 0,
"ram_total_gb": 0,
"ram_percent": 0,
"vram_used_gb": 0,
"vram_total_gb": 0,
"vram_percent": 0,
}
# RAM usage
try:
mem = psutil.virtual_memory()
result["ram_used_gb"] = mem.used / (1024**3)
result["ram_total_gb"] = mem.total / (1024**3)
result["ram_percent"] = mem.percent
except:
pass
# VRAM usage
if torch.cuda.is_available():
try:
vram_allocated = torch.cuda.memory_allocated() / (1024**3)
vram_reserved = torch.cuda.memory_reserved() / (1024**3)
vram_total = torch.cuda.get_device_properties(0).total_memory / (1024**3)
result["vram_used_gb"] = vram_allocated
result["vram_reserved_gb"] = vram_reserved
result["vram_total_gb"] = vram_total
result["vram_percent"] = (vram_allocated / vram_total) * 100 if vram_total > 0 else 0
except:
pass
return result
def check_memory_available(required_vram_gb=2.0, required_ram_gb=2.0):
"""Check if enough memory is available
Args:
required_vram_gb: Required VRAM in GB
required_ram_gb: Required RAM in GB
Returns:
tuple: (vram_ok, ram_ok, memory_info)
"""
mem = get_memory_usage()
vram_available = mem["vram_total_gb"] - mem["vram_used_gb"]
ram_available = mem["ram_total_gb"] - mem["ram_used_gb"]
vram_ok = vram_available >= required_vram_gb
ram_ok = ram_available >= required_ram_gb
return vram_ok, ram_ok, mem
def should_chunk_video(video_duration, video_resolution, vram_gb):
"""Determine if video should be processed in chunks
Args:
video_duration: Duration in seconds
video_resolution: Tuple of (width, height)
vram_gb: Available VRAM in GB
Returns:
tuple: (should_chunk, chunk_duration, reason)
"""
width, height = video_resolution
pixels = width * height
# Base chunk duration on resolution and VRAM
# Higher resolution = smaller chunks
# Less VRAM = smaller chunks
if pixels >= 7680 * 4320: # 8K
base_chunk = 30
elif pixels >= 3840 * 2160: # 4K
base_chunk = 60
elif pixels >= 1920 * 1080: # 1080p
base_chunk = 120
elif pixels >= 1280 * 720: # 720p
base_chunk = 180
else:
base_chunk = 300
# Adjust for VRAM
vram_factor = min(1.0, vram_gb / 16.0) # 16GB as baseline
chunk_duration = int(base_chunk * vram_factor)
# Minimum chunk duration
chunk_duration = max(30, chunk_duration)
# Decide if chunking is needed
should_chunk = video_duration > chunk_duration * 1.5
if should_chunk:
reason = f"Video duration ({video_duration:.0f}s) > chunk size ({chunk_duration}s) for {width}x{height} @ {vram_gb:.0f}GB VRAM"
else:
reason = f"Video can be processed in one pass ({video_duration:.0f}s)"
return should_chunk, chunk_duration, reason
def extract_audio_chunk(video_path, start_time, duration, output_path):
"""Extract a chunk of audio from video
Args:
video_path: Path to video file
start_time: Start time in seconds
duration: Duration in seconds
output_path: Path to save audio chunk
Returns:
Path to extracted audio chunk or None on failure
"""
cmd = [
'ffmpeg', '-y',
'-ss', str(start_time),
'-i', video_path,
'-t', str(duration),
'-vn', # No video
'-acodec', 'pcm_s16le', # WAV format
'-ar', '16000', # 16kHz sample rate (optimal for Whisper)
'-ac', '1', # Mono
output_path
]
try:
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0 and os.path.exists(output_path):
return output_path
return None
except Exception as e:
print(f" ⚠️ Audio chunk extraction failed: {e}")
return None
def get_video_info(video_path):
"""Get video information (duration, resolution, fps)
Args:
video_path: Path to video file
Returns:
dict with video info or None on failure
"""
try:
# Get duration
duration_result = subprocess.run(
['ffprobe', '-v', 'error', '-show_entries', 'format=duration',
'-of', 'default=noprint_wrappers=1:nokey=1', video_path],
capture_output=True, text=True
)
duration = float(duration_result.stdout.strip())
# Get resolution
resolution_result = subprocess.run(
['ffprobe', '-v', 'error', '-select_streams', 'v:0',
'-show_entries', 'stream=width,height',
'-of', 'csv=s=x:p=0', video_path],
capture_output=True, text=True
)
width, height = map(int, resolution_result.stdout.strip().split('x'))
# Get fps
fps_result = subprocess.run(
['ffprobe', '-v', 'error', '-select_streams', 'v:0',
'-show_entries', 'stream=r_frame_rate',
'-of', 'default=noprint_wrappers=1:nokey=1', video_path],
capture_output=True, text=True
)
fps_parts = fps_result.stdout.strip().split('/')
fps = float(fps_parts[0]) / float(fps_parts[1]) if len(fps_parts) == 2 else float(fps_parts[0])
return {
"duration": duration,
"width": width,
"height": height,
"resolution": (width, height),
"fps": fps,
}
except Exception as e:
print(f" ⚠️ Could not get video info: {e}")
return None
class ModelManager:
"""Context manager for model lifecycle management
Ensures models are properly unloaded after use to prevent memory leaks.
Usage:
with ModelManager("Whisper", model_size="base") as model:
result = model.transcribe(audio_path)
"""
_loaded_models = {} # Track loaded models to avoid reloading
def __init__(self, model_type, device=None, **kwargs):
self.model_type = model_type
self.device = device
self.kwargs = kwargs
self.model = None
self._model_key = f"{model_type}_{kwargs.get('model_size', '')}"
def __enter__(self):
# Check if model is already loaded
if self._model_key in self._loaded_models:
return self._loaded_models[self._model_key]
print(f" 📦 Loading {self.model_type} model...")
mem_before = get_memory_usage()
try:
if self.model_type == "Whisper":
model_size = self.kwargs.get("model_size", "base")
self.model = whisper.load_model(model_size, device=self.device)
elif self.model_type == "MarianMT":
model_name = self.kwargs.get("model_name")
self.model = {
"model": MarianMTModel.from_pretrained(model_name),
"tokenizer": MarianTokenizer.from_pretrained(model_name),
}
elif self.model_type == "MusicGen":
model_size = self.kwargs.get("model_size", "medium")
self.model = MusicGen.get_pretrained(f"facebook/musicgen-{model_size}")
if self.device:
self.model.to(self.device)
# Cache the model
self._loaded_models[self._model_key] = self.model
mem_after = get_memory_usage()
vram_used = mem_after["vram_used_gb"] - mem_before["vram_used_gb"]
print(f" Model loaded (VRAM: +{vram_used:.2f}GB)")
return self.model
except Exception as e:
print(f" ❌ Failed to load {self.model_type}: {e}")
raise
def __exit__(self, exc_type, exc_val, exc_tb):
# Don't unload cached models - they will be unloaded explicitly when needed
pass
@classmethod
def unload_model(cls, model_type, **kwargs):
"""Explicitly unload a model from memory"""
model_key = f"{model_type}_{kwargs.get('model_size', kwargs.get('model_name', ''))}"
if model_key in cls._loaded_models:
print(f" 🗑️ Unloading {model_type} model...")
model = cls._loaded_models.pop(model_key)
# Delete model reference
del model
# Clear memory
clear_memory(clear_cuda=True, aggressive=True)
print(f" Model unloaded and memory cleared")
@classmethod
def unload_all(cls):
"""Unload all cached models"""
print(f" 🗑️ Unloading all cached models ({len(cls._loaded_models)} models)...")
cls._loaded_models.clear()
clear_memory(clear_cuda=True, aggressive=True)
print(f" All models unloaded")
def process_long_video_in_chunks(video_path, process_func, chunk_duration=60,
overlap=2, progress_callback=None, **kwargs):
"""Process a long video in chunks to manage memory
Args:
video_path: Path to video file
process_func: Function to call for each chunk (chunk_path, start_time, **kwargs)
chunk_duration: Duration of each chunk in seconds
overlap: Overlap between chunks in seconds (for continuity)
progress_callback: Optional callback for progress updates
**kwargs: Additional arguments passed to process_func
Returns:
Combined results from all chunks
"""
video_info = get_video_info(video_path)
if not video_info:
print("❌ Could not get video info")
return None
total_duration = video_info["duration"]
if total_duration <= chunk_duration:
# Video is short enough, process directly
return process_func(video_path, 0, **kwargs)
print(f"\n📹 Processing long video in chunks")
print(f" Duration: {total_duration:.1f}s")
print(f" Chunk size: {chunk_duration}s")
print(f" Expected chunks: {int(total_duration / chunk_duration) + 1}")
print()
results = []
start_time = 0
chunk_num = 0
total_chunks = int(total_duration / chunk_duration) + 1
temp_dir = tempfile.mkdtemp(prefix="videogen_chunks_")
try:
while start_time < total_duration:
chunk_num += 1
actual_duration = min(chunk_duration, total_duration - start_time)
if progress_callback:
progress_callback(chunk_num, total_chunks, start_time, actual_duration)
print(f" 📦 Processing chunk {chunk_num}/{total_chunks} ({start_time:.1f}s - {start_time + actual_duration:.1f}s)")
# Extract audio chunk
chunk_audio = os.path.join(temp_dir, f"chunk_{chunk_num}.wav")
if not extract_audio_chunk(video_path, start_time, actual_duration, chunk_audio):
print(f" ⚠️ Failed to extract chunk, skipping")
start_time += chunk_duration - overlap
continue
# Process chunk
try:
chunk_result = process_func(chunk_audio, start_time, **kwargs)
if chunk_result:
results.append(chunk_result)
# Clear memory after each chunk
clear_memory(clear_cuda=True)
except Exception as e:
print(f" ⚠️ Chunk processing failed: {e}")
# Clean up chunk file
if os.path.exists(chunk_audio):
os.remove(chunk_audio)
start_time += chunk_duration - overlap
return results
finally:
# Clean up temp directory
import shutil
if os.path.exists(temp_dir):
shutil.rmtree(temp_dir, ignore_errors=True)
# NSFW text classification
TRANSFORMERS_AVAILABLE = False
NSFW_CLASSIFIER = None
......@@ -5473,13 +5858,14 @@ TRANSLATION_LANGUAGES = {
}
def transcribe_video_audio(video_path, model_size="base", language=None):
"""Transcribe audio from video using Whisper
def transcribe_video_audio(video_path, model_size="base", language=None, auto_chunk=True):
"""Transcribe audio from video using Whisper with memory management
Args:
video_path: Path to the video file
model_size: Whisper model size (tiny, base, small, medium, large)
language: Source language code (optional, auto-detected if not provided)
auto_chunk: Automatically chunk long videos (default: True)
Returns:
List of segments with text, start, end times
......@@ -5491,17 +5877,53 @@ def transcribe_video_audio(video_path, model_size="base", language=None):
print(f"🎤 Transcribing audio from: {video_path}")
print(f" Model: {model_size}")
# Get video info for memory management
video_info = get_video_info(video_path)
vram_gb = detect_vram_gb() if torch.cuda.is_available() else 8
# Determine if chunking is needed
should_chunk = False
chunk_duration = 300 # Default 5 minutes
if auto_chunk and video_info:
duration = video_info["duration"]
resolution = video_info["resolution"]
should_chunk, chunk_duration, reason = should_chunk_video(duration, resolution, vram_gb)
print(f" {reason}")
try:
# Load Whisper model
# Check memory before loading model
vram_ok, ram_ok, mem_info = check_memory_available(required_vram_gb=2.0, required_ram_gb=2.0)
print(f" Memory: VRAM {mem_info['vram_used_gb']:.1f}/{mem_info['vram_total_gb']:.1f}GB, RAM {mem_info['ram_percent']:.0f}%")
if not vram_ok:
print(f" ⚠️ Low VRAM, will use aggressive memory management")
# Load Whisper model with memory tracking
mem_before = get_memory_usage()
model = whisper.load_model(model_size)
mem_after = get_memory_usage()
# Transcribe
vram_used = mem_after["vram_used_gb"] - mem_before["vram_used_gb"]
print(f" Model loaded (VRAM: +{vram_used:.2f}GB)")
if should_chunk and video_info:
# Process in chunks for long videos
result = _transcribe_chunked(video_path, model, video_info, chunk_duration, language)
else:
# Process entire video at once
transcribe_options = {}
if language:
transcribe_options["language"] = language
result = model.transcribe(video_path, **transcribe_options)
# Unload model to free memory
del model
clear_memory(clear_cuda=True)
print(f" Model unloaded, memory cleared")
# Process segments
segments = []
for seg in result["segments"]:
segments.append({
......@@ -5520,9 +5942,137 @@ def transcribe_video_audio(video_path, model_size="base", language=None):
}
except Exception as e:
print(f"❌ Transcription failed: {e}")
# Clear memory on failure
clear_memory(clear_cuda=True, aggressive=True)
return None
def _transcribe_chunked(video_path, model, video_info, chunk_duration, language=None):
"""Internal function to transcribe long videos in chunks
Args:
video_path: Path to video file
model: Loaded Whisper model
video_info: Video information dict
chunk_duration: Duration of each chunk in seconds
language: Optional language code
Returns:
Combined transcription result
"""
total_duration = video_info["duration"]
overlap = 5 # 5 second overlap for continuity
print(f"\n 📦 Processing in chunks ({chunk_duration}s each)")
all_segments = []
all_text = []
detected_language = None
start_time = 0
chunk_num = 0
total_chunks = int(total_duration / chunk_duration) + 1
temp_dir = tempfile.mkdtemp(prefix="whisper_chunks_")
try:
while start_time < total_duration:
chunk_num += 1
actual_duration = min(chunk_duration, total_duration - start_time)
print(f" Chunk {chunk_num}/{total_chunks} ({start_time:.1f}s - {start_time + actual_duration:.1f}s)")
# Extract audio chunk
chunk_audio = os.path.join(temp_dir, f"chunk_{chunk_num}.wav")
if not extract_audio_chunk(video_path, start_time, actual_duration, chunk_audio):
start_time += chunk_duration - overlap
continue
# Transcribe chunk
transcribe_options = {}
if language:
transcribe_options["language"] = language
try:
chunk_result = model.transcribe(chunk_audio, **transcribe_options)
# Adjust timestamps to global time
for seg in chunk_result.get("segments", []):
adjusted_seg = {
"text": seg["text"].strip(),
"start": seg["start"] + start_time,
"end": seg["end"] + start_time,
}
all_segments.append(adjusted_seg)
all_text.append(chunk_result.get("text", ""))
if not detected_language:
detected_language = chunk_result.get("language")
except Exception as e:
print(f" ⚠️ Chunk failed: {e}")
# Clean up chunk file
if os.path.exists(chunk_audio):
os.remove(chunk_audio)
# Clear memory periodically
if chunk_num % 3 == 0:
clear_memory(clear_cuda=False) # Don't clear CUDA, model still needed
start_time += chunk_duration - overlap
# Merge overlapping segments
merged_segments = _merge_overlapping_segments(all_segments)
return {
"segments": merged_segments,
"text": " ".join(all_text),
"language": detected_language or "unknown",
}
finally:
# Clean up temp directory
import shutil
if os.path.exists(temp_dir):
shutil.rmtree(temp_dir, ignore_errors=True)
def _merge_overlapping_segments(segments, max_gap=1.0):
"""Merge overlapping or adjacent segments
Args:
segments: List of segments with start, end, text
max_gap: Maximum gap to merge (in seconds)
Returns:
Merged segments list
"""
if not segments:
return []
# Sort by start time
sorted_segments = sorted(segments, key=lambda x: x["start"])
merged = [sorted_segments[0]]
for seg in sorted_segments[1:]:
last = merged[-1]
# Check if segments overlap or are adjacent
if seg["start"] <= last["end"] + max_gap:
# Merge: extend end time if later, combine text
last["end"] = max(last["end"], seg["end"])
# Avoid duplicating text
if seg["text"] not in last["text"]:
last["text"] = last["text"] + " " + seg["text"]
else:
merged.append(seg)
return merged
def translate_text(text, source_lang, target_lang):
"""Translate text using MarianMT models
......@@ -7003,6 +7553,103 @@ def main(args):
if debug:
print(f" [DEBUG] Subdirectory load failed: {sub_e}")
continue
# Strategy 3b: Check if this is a component-only model (fine-tuned weights only)
if not pipeline_loaded_successfully and 'config.json' in files:
try:
from huggingface_hub import hf_hub_download
import json as json_module
# Download and read config.json
config_path = hf_hub_download(
model_id_to_load,
"config.json",
token=hf_token
)
with open(config_path, 'r') as cf:
model_config = json_module.load(cf)
class_name = model_config.get("_class_name", "")
model_type = model_config.get("model_type", "")
arch_type = model_config.get("architectures", [])
if debug:
print(f" [DEBUG] Config class_name: {class_name}")
print(f" [DEBUG] Config model_type: {model_type}")
print(f" [DEBUG] Config architectures: {arch_type}")
# Detect component type
is_component = False
component_class = None
# Check explicit class name
component_classes = [
"LTXVideoTransformer3DModel",
"UNet2DConditionModel",
"UNet3DConditionModel",
"AutoencoderKL",
"AutoencoderKLLTXVideo",
]
if class_name in component_classes:
is_component = True
component_class = class_name
elif model_type in ["ltx_video", "ltxvideo"]:
is_component = True
component_class = "LTXVideoTransformer3DModel"
elif any("LTX" in str(a) for a in arch_type):
is_component = True
component_class = "LTXVideoTransformer3DModel"
elif "ltx" in model_id_to_load.lower() and any(k in model_config for k in ["num_layers", "hidden_size"]):
is_component = True
component_class = "LTXVideoTransformer3DModel"
if is_component:
print(f" 📦 Detected component-only model: {component_class}")
print(f" This is a fine-tuned component, loading base model first...")
# Determine base model
base_model = None
model_id_lower = model_id_to_load.lower()
if "ltx" in model_id_lower or "ltxvideo" in model_id_lower:
base_model = "Lightricks/LTX-Video"
elif "wan" in model_id_lower:
base_model = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
elif "svd" in model_id_lower:
base_model = "stabilityai/stable-video-diffusion-img2vid-xt-1-1"
if base_model:
print(f" Loading base pipeline: {base_model}")
pipe = PipelineClass.from_pretrained(base_model, **pipe_kwargs)
print(f" ✅ Base pipeline loaded")
# Load the fine-tuned component
if component_class == "LTXVideoTransformer3DModel":
from diffusers import LTXVideoTransformer3DModel
print(f" Loading fine-tuned transformer...")
pipe.transformer = LTXVideoTransformer3DModel.from_pretrained(
model_id_to_load,
torch_dtype=pipe_kwargs.get("torch_dtype", torch.float16),
token=hf_token
)
print(f" ✅ Fine-tuned transformer loaded!")
pipeline_loaded_successfully = True
elif component_class == "AutoencoderKLLTXVideo":
from diffusers import AutoencoderKLLTXVideo
print(f" Loading fine-tuned VAE...")
pipe.vae = AutoencoderKLLTXVideo.from_pretrained(
model_id_to_load,
torch_dtype=pipe_kwargs.get("torch_dtype", torch.float16),
token=hf_token
)
print(f" ✅ Fine-tuned VAE loaded!")
pipeline_loaded_successfully = True
except Exception as comp_e:
if debug:
print(f" [DEBUG] Component detection failed: {comp_e}")
except Exception as api_e:
if debug:
print(f" [DEBUG] API check failed: {api_e}")
......@@ -7674,8 +8321,14 @@ def main(args):
model_config = json_module.load(cf)
class_name = model_config.get("_class_name", "")
arch_type = model_config.get("architectures", [])
model_type = model_config.get("model_type", "")
if debug:
print(f" [DEBUG] Model class name: {class_name}")
print(f" [DEBUG] Architectures: {arch_type}")
print(f" [DEBUG] Model type: {model_type}")
print(f" [DEBUG] Config keys: {list(model_config.keys())[:10]}")
# Check if this is a component-only model (transformer, unet, etc.)
component_classes = [
......@@ -7686,7 +8339,27 @@ def main(args):
"AutoencoderKLLTXVideo",
]
if class_name in component_classes:
# Also detect by model_type or architecture
is_component = class_name in component_classes
# If no _class_name, check for other indicators
if not class_name:
# Check model_type for hints
if model_type in ["ltx_video", "ltxvideo"]:
is_component = True
class_name = "LTXVideoTransformer3DModel"
# Check architectures
elif any("LTX" in str(a) for a in arch_type):
is_component = True
class_name = "LTXVideoTransformer3DModel"
# Check for typical transformer config keys
elif any(k in model_config for k in ["num_layers", "hidden_size", "num_attention_heads"]):
# This looks like a transformer config
if "ltx" in model_id_lower:
is_component = True
class_name = "LTXVideoTransformer3DModel"
if is_component:
print(f" 📦 Detected component-only model: {class_name}")
print(f" This is a fine-tuned component, not a full pipeline.")
......
......@@ -789,6 +789,160 @@ async def list_tools() -> list:
"required": ["text", "source_lang", "target_lang"]
}
),
# Character Consistency Tools
Tool(
name="videogen_create_character",
description="Create a character profile from reference images for consistent character generation across multiple images/videos.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character name (alphanumeric, underscores, hyphens only)"
},
"reference_images": {
"type": "array",
"items": {"type": "string"},
"description": "List of paths to reference images (1-5 images)"
},
"description": {
"type": "string",
"description": "Optional description of the character"
}
},
"required": ["name", "reference_images"]
}
),
Tool(
name="videogen_list_characters",
description="List all saved character profiles.",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="videogen_show_character",
description="Show details of a specific character profile.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character profile name"
}
},
"required": ["name"]
}
),
Tool(
name="videogen_delete_character",
description="Delete a character profile.",
inputSchema={
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Character profile name to delete"
}
},
"required": ["name"]
}
),
Tool(
name="videogen_generate_with_character",
description="Generate an image or video with a specific character using IP-Adapter and/or InstantID for consistency.",
inputSchema={
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "Description of what to generate with the character"
},
"character": {
"type": "string",
"description": "Character profile name to use"
},
"model": {
"type": "string",
"description": "Model name (e.g., flux_dev, sdxl_base)"
},
"output": {
"type": "string",
"default": "output"
},
"use_ipadapter": {
"type": "boolean",
"description": "Use IP-Adapter for character consistency",
"default": True
},
"use_instantid": {
"type": "boolean",
"description": "Use InstantID for face identity preservation",
"default": False
},
"ipadapter_scale": {
"type": "number",
"description": "IP-Adapter influence scale (0.0-1.0)",
"default": 0.8
},
"instantid_scale": {
"type": "number",
"description": "InstantID influence scale (0.0-1.0)",
"default": 0.8
},
"animate": {
"type": "boolean",
"description": "Generate video instead of image (I2V)",
"default": False
}
},
"required": ["prompt", "character", "model"]
}
),
Tool(
name="videogen_generate_with_reference",
description="Generate an image using reference images directly (without creating a character profile).",
inputSchema={
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "Description of what to generate"
},
"reference_images": {
"type": "array",
"items": {"type": "string"},
"description": "List of paths to reference images"
},
"model": {
"type": "string",
"description": "Model name (e.g., flux_dev, sdxl_base)"
},
"output": {
"type": "string",
"default": "output"
},
"ipadapter_scale": {
"type": "number",
"description": "IP-Adapter influence scale (0.0-1.0)",
"default": 0.8
},
"use_instantid": {
"type": "boolean",
"description": "Use InstantID for face identity",
"default": False
}
},
"required": ["prompt", "reference_images", "model"]
}
),
]
......@@ -1123,6 +1277,84 @@ async def call_tool(name: str, arguments: dict) -> list:
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
# Character Consistency Tools
elif name == "videogen_create_character":
args = [
"--create-character", arguments["name"],
]
# Add reference images
for img in arguments["reference_images"][:5]: # Max 5 images
args.extend(["--character-images", img])
if arguments.get("description"):
args.extend(["--character-desc", arguments["description"]])
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_list_characters":
args = ["--list-characters"]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_show_character":
args = ["--show-character", arguments["name"]]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_delete_character":
args = ["--delete-character", arguments["name"]]
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_generate_with_character":
args = [
"--model", arguments["model"],
"--character", arguments["character"],
"--prompt", arguments["prompt"],
"--output", arguments.get("output", "output"),
]
# IP-Adapter options
if arguments.get("use_ipadapter", True):
args.append("--ipadapter")
if arguments.get("ipadapter_scale"):
args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
# InstantID options
if arguments.get("use_instantid", False):
args.append("--instantid")
if arguments.get("instantid_scale"):
args.extend(["--instantid-scale", str(arguments["instantid_scale"])])
# Animate for I2V
if arguments.get("animate", False):
args.append("--image_to_video")
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
elif name == "videogen_generate_with_reference":
args = [
"--model", arguments["model"],
"--prompt", arguments["prompt"],
"--output", arguments.get("output", "output"),
]
# Add reference images
for img in arguments["reference_images"]:
args.extend(["--reference-images", img])
# IP-Adapter options
args.append("--ipadapter")
if arguments.get("ipadapter_scale"):
args.extend(["--ipadapter-scale", str(arguments["ipadapter_scale"])])
# InstantID options
if arguments.get("use_instantid", False):
args.append("--instantid")
output, code = run_videogen_command(args)
return [TextContent(type="text", text=output)]
else:
return [TextContent(type="text", text=f"Unknown tool: {name}")]
......
{
i {
"models": {
"wan_1.3b_i2v": {
"id": "Wan-AI/Wan2.1-I2V-1.3B-Diffusers",
......
......@@ -614,6 +614,136 @@ def delete_output(filename):
return jsonify({'success': True})
return jsonify({'error': 'File not found'}), 404
# Character Profile API endpoints
CHARACTERS_DIR = Path.home() / ".config" / "videogen" / "characters"
@app.route('/api/characters', methods=['GET'])
def api_list_characters():
"""List all character profiles"""
characters = []
if CHARACTERS_DIR.exists():
for profile_file in CHARACTERS_DIR.glob("*.json"):
try:
with open(profile_file, 'r') as f:
profile = json.load(f)
characters.append({
'name': profile.get('name', profile_file.stem),
'description': profile.get('description', ''),
'image_count': len(profile.get('reference_images', [])),
'created': profile.get('created', ''),
'tags': profile.get('tags', [])
})
except Exception as e:
print(f"Error loading character profile {profile_file}: {e}")
return jsonify(characters)
@app.route('/api/characters/<name>', methods=['GET'])
def api_get_character(name):
"""Get a specific character profile"""
profile_path = CHARACTERS_DIR / f"{name}.json"
if profile_path.exists():
try:
with open(profile_path, 'r') as f:
return jsonify(json.load(f))
except Exception as e:
return jsonify({'error': str(e)}), 500
return jsonify({'error': 'Character not found'}), 404
@app.route('/api/characters', methods=['POST'])
def api_create_character():
"""Create a new character profile"""
name = request.form.get('name')
description = request.form.get('description', '')
if not name:
return jsonify({'error': 'Name is required'}), 400
# Sanitize name
name = re.sub(r'[^a-zA-Z0-9_-]', '_', name)
# Handle uploaded images
images = request.files.getlist('images')
if not images or len(images) == 0:
return jsonify({'error': 'At least one reference image is required'}), 400
# Create character directory
CHARACTERS_DIR.mkdir(parents=True, exist_ok=True)
char_image_dir = CHARACTERS_DIR / name
char_image_dir.mkdir(parents=True, exist_ok=True)
# Save images
saved_images = []
for i, img in enumerate(images[:5]): # Max 5 images
if img and img.filename:
ext = img.filename.rsplit('.', 1)[-1].lower()
if ext in ALLOWED_EXTENSIONS['image']:
filename = f"reference_{i+1}.{ext}"
filepath = char_image_dir / filename
img.save(filepath)
saved_images.append(str(filepath))
if not saved_images:
return jsonify({'error': 'No valid images uploaded'}), 400
# Create profile
profile = {
'name': name,
'description': description,
'reference_images': saved_images,
'created': datetime.now().isoformat(),
'tags': []
}
# Save profile
profile_path = CHARACTERS_DIR / f"{name}.json"
with open(profile_path, 'w') as f:
json.dump(profile, f, indent=2)
return jsonify(profile)
@app.route('/api/characters/<name>', methods=['DELETE'])
def api_delete_character(name):
"""Delete a character profile"""
profile_path = CHARACTERS_DIR / f"{name}.json"
char_image_dir = CHARACTERS_DIR / name
if not profile_path.exists():
return jsonify({'error': 'Character not found'}), 404
try:
# Delete profile file
profile_path.unlink()
# Delete images directory
if char_image_dir.exists():
shutil.rmtree(char_image_dir)
return jsonify({'success': True})
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/api/upload-multiple', methods=['POST'])
def upload_multiple_files():
"""Upload multiple files (for reference images)"""
files = request.files.getlist('files')
upload_type = request.form.get('type', 'general')
if not files:
return jsonify({'error': 'No files provided'}), 400
saved_paths = []
for f in files:
if f and f.filename:
ext = f.filename.rsplit('.', 1)[-1].lower()
if ext in ALLOWED_EXTENSIONS['image']:
filename = f"{uuid.uuid4().hex[:8]}_{secure_filename(f.filename)}"
filepath = UPLOAD_FOLDER / filename
f.save(filepath)
saved_paths.append(str(filepath))
return jsonify({'paths': saved_paths})
# WebSocket events
@socketio.on('connect')
def handle_connect():
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment