Commits · a4fda7b0bd402ecd44f39a873e4205020ca09d7a · nexlab / videogen

25 Feb, 2026 18 commits

Add system dependencies for Debian/Ubuntu to requirements.txt · a4fda7b0

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Added comprehensive system dependencies section for Debian/Ubuntu:

Required system packages:
- build-essential, cmake, pkg-config (build tools)
- ffmpeg (video processing)
- libavformat-dev, libavcodec-dev, libavdevice-dev, libavutil-dev
- libavfilter-dev, libswscale-dev, libswresample-dev (FFmpeg dev libs)
- libsdl2-dev, libssl-dev, libcurl4-openssl-dev
- python3-dev

These are required for:
- PyAV (av package) - needed by audiocraft/MusicGen
- face-recognition (dlib)
- Building Python extensions

Updated quick install instructions to include system dependencies step.

a4fda7b0

Update requirements.txt for Python 3.12+ compatibility · b0d43691

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Updated all package versions to be compatible with Python 3.12 and 3.13:

Core Dependencies:
- torch>=2.2.0 (was 2.0.0)
- torchvision>=0.17.0 (was 0.15.0)
- torchaudio>=2.2.0 (was 2.0.0)
- diffusers>=0.32.0 (was 0.30.0)
- transformers>=4.40.0 (was 4.35.0)
- accelerate>=0.27.0 (was 0.24.0)
- xformers>=0.0.25 (was 0.0.22)
- spandrel>=0.2.0 (was 0.1.0)
- ftfy>=6.2.0 (was 6.1.0)
- Pillow>=10.2.0 (was 10.0.0)
- safetensors>=0.4.2 (was 0.4.0)
- huggingface-hub>=0.23.0 (was 0.19.0)
- peft>=0.10.0 (was 0.7.0)
- numpy>=1.26.0 (added for Python 3.12+ compatibility)

Audio Dependencies:
- scipy>=1.12.0 (was 1.11.0)
- librosa>=0.10.2 (was 0.10.0)
- edge-tts>=6.1.10 (was 6.1.0)

Web Interface:
- flask>=3.0.2 (was 3.0.0)
- flask-socketio>=5.3.6 (was 5.3.0)
- eventlet>=0.36.0 (was 0.33.0)
- python-socketio>=5.11.0 (was 5.10.0)
- werkzeug>=3.0.1 (was 3.0.0)

Added detailed installation notes for Python 3.12+ including:
- PyTorch nightly installation for CUDA
- xformers --pre flag for Python 3.13
- Git installation for diffusers/transformers
- Quick install commands

b0d43691

Fix loading transformer-only fine-tuned models (like Muinez/ltxvideo-2b-nsfw) · 03b62189

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Some models on HuggingFace are not full pipelines but just fine-tuned components
(e.g., just the transformer weights). These have a config.json at root level with
_class_name pointing to a component class like 'LTXVideoTransformer3DModel'.

This fix adds:

1. Detection of component-only models:
   - Check for config.json at root level
   - Read _class_name to determine component type
   - Detect if it's a transformer, VAE, or other component

2. Proper loading strategy:
   - Load the base pipeline first (e.g., Lightricks/LTX-Video)
   - Then load the fine-tuned component from the model repo
   - Replace the base component with the fine-tuned one

3. Supported component classes:
   - LTXVideoTransformer3DModel → Lightricks/LTX-Video
   - AutoencoderKLLTXVideo → Lightricks/LTX-Video
   - UNet2DConditionModel, UNet3DConditionModel, AutoencoderKL

This allows loading models like Muinez/ltxvideo-2b-nsfw which are
fine-tuned transformer weights without a full pipeline structure.

03b62189

Fix loading models without model_index.json (I2V models) · c5cdb9fd

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

When a model has component folders (transformer, vae, etc.) but no model_index.json
at the root level, the loading would fail. This fix adds:

1. Base model fallback strategy:
   - Detect model type from model ID (ltx, wan, svd, cogvideo, mochi)
   - Load the known base model first
   - Then attempt to load fine-tuned components from the target model

2. Component detection and loading:
   - List files in the repo to find component folders
   - Load transformer, VAE components from the fine-tuned model
   - Fall back to base model if component loading fails

3. Better error messages:
   - Clear indication of what went wrong
   - Suggestions for alternative models

This fixes loading of models like Muinez/ltxvideo-2b-nsfw which have
all component folders but are missing the model_index.json file.

c5cdb9fd

Add system load detection and more conservative time estimates · ebf80ab6

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

System Load Detection:
- Added get_system_load() method to detect CPU, memory, and GPU utilization
- CPU load >80% adds 50% slowdown, >50% adds 20% slowdown
- Memory >90% adds 80% slowdown, >75% adds 40% slowdown
- GPU utilization >80% adds 60% slowdown, >50% adds 30% slowdown
- Warning displayed when system is under heavy load

More Conservative Base Estimates:
- WanPipeline: 3.0s → 5.0s/frame
- MochiPipeline: 5.0s → 8.0s/frame
- SVD: 1.5s → 2.5s/frame
- CogVideoX: 4.0s → 6.0s/frame
- LTXVideo: 4.0s → 6.0s/frame
- Flux: 8.0s → 12.0s/frame
- Allegro: 8.0s → 12.0s/frame
- Hunyuan: 10.0s → 15.0s/frame
- OpenSora: 6.0s → 10.0s/frame

More Conservative GPU Tier Multipliers:
- extreme: 1.0 → 1.2x
- high: 1.5 → 2.0x
- medium: 2.5 → 3.5x
- low: 4.0 → 5.0x
- very_low: 8.0 → 10.0x

More Conservative Model Loading Times:
- Huge (>50GB): 10min → 15min
- Large (30-50GB): 5min → 8min
- Medium (16-30GB): 3min → 5min
- Small (<16GB): 1.5min → 3min
- Download estimate: 15s/GB → 30s/GB

Additional Safety Margins:
- Overhead increased from 30% to 50%
- I2V processing overhead increased from 20% to 30%
- Added 20% safety margin for unpredictable factors
- Load factor applied to model loading time as well

ebf80ab6

Fix time estimation to be more realistic · 8c48cea3

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Increased base time per frame for all models (2-4x more realistic)
- Added LTXVideoPipeline specific estimate (4.0s/frame)
- Increased model loading times (90s-10min based on model size)
- Added realistic image model loading times for I2V mode
- Added image generation time based on model type (Flux, SDXL, SD3)
- Added 30% overhead for I/O and memory operations
- Added 20% extra time for I2V processing
- Increased resolution scaling factor to 1.3 (quadratic relationship)
- Increased download time estimate to 15s/GB with 2min cap

The previous estimates were too optimistic and didn't account for:
- Full diffusion process (multiple denoising steps)
- Model loading from disk/download
- Memory management overhead
- I2V-specific processing time
- Image model loading for I2V mode

8c48cea3

Add web interface for VideoGen · 5291deb2

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Features:
- Modern web UI with all generation modes (T2V, I2V, T2I, I2I, V2V, Dub, Subtitles, Upscale)
- Real-time progress updates via WebSocket
- File upload for input images/videos/audio
- File download for generated content
- Background job processing with progress tracking
- Job management (cancel, retry, delete)
- Gallery for browsing generated files
- REST API for programmatic access
- Responsive design for desktop and mobile

Backend (webapp.py):
- Flask + Flask-SocketIO for real-time updates
- Background job processing with threading
- File upload/download handling
- Job state persistence
- REST API endpoints

Frontend:
- Modern dark theme UI
- Mode selection with visual cards
- Form with all options and settings
- Real-time progress modal with log streaming
- Toast notifications
- Keyboard shortcuts (Ctrl+Enter to submit, Escape to close)

Documentation:
- Updated README.md with web interface section
- Updated EXAMPLES.md with web interface usage
- Updated requirements.txt with web dependencies

5291deb2

Add 404 fallback to deferred I2V model loading · 344cd12a

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Apply same 404 fallback strategy to deferred I2V model loading
- Try DiffusionPipeline as fallback when model_index.json not found
- Ensures all model loading paths have consistent error handling

344cd12a

Fix model loading 404 errors and improve time estimation · c2c62b60

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Model Loading Fixes:
- Add fallback loading when model_index.json returns 404
- Try alternative paths (diffusers/, diffusion_model/, pipeline/)
- Try generic DiffusionPipeline as fallback
- Check HuggingFace API for actual file structure
- Load from subdirectories if model_index.json found there
- Apply same fallback to I2V image model loading

Time Estimation Improvements:
- Add hardware detection (GPU model, VRAM, RAM, CPU cores)
- Detect GPU tier (extreme/high/medium/low/very_low)
- Calculate realistic time estimates based on GPU performance
- Account for VRAM constraints and offloading penalty
- Consider distributed/multi-GPU setups
- More accurate model loading times (minutes, not seconds)
- Account for resolution impact (quadratic relationship)
- Add 20% overhead for memory management
- Print hardware info for transparency

GPU Tier Performance Multipliers:
- Extreme (RTX 4090, A100, H100): 1.0x
- High (RTX 4080, RTX 3090, V100): 1.5x
- Medium (RTX 4070, RTX 3080, T4): 2.5x
- Low (RTX 3060, RTX 2070): 4.0x
- Very Low (GTX 1060, etc.): 8.0x

c2c62b60

Add video dubbing, translation, and subtitle features · 6505a00a

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Features Added:
- Video dubbing with voice preservation (--dub-video)
- Automatic subtitle generation (--create-subtitles)
- Subtitle translation (--translate-subtitles)
- Burn subtitles into video (--burn-subtitles)
- Audio transcription using Whisper (--transcribe)
- Text translation using MarianMT models

New Command-Line Arguments:
- --transcribe: Transcribe audio from video
- --whisper-model: Select Whisper model size (tiny/base/small/medium/large)
- --source-lang: Source language code
- --target-lang: Target language code for translation
- --create-subtitles: Create SRT subtitles from video
- --translate-subtitles: Translate subtitles to target language
- --burn-subtitles: Burn subtitles into video
- --subtitle-style: Customize subtitle appearance
- --dub-video: Translate and dub video with voice preservation
- --voice-clone/--no-voice-clone: Enable/disable voice cloning

MCP Server Updates:
- Added videogen_transcribe_video tool
- Added videogen_create_subtitles tool
- Added videogen_dub_video tool
- Added videogen_translate_text tool

Documentation Updates:
- Updated SKILL.md with dubbing/translation section
- Updated EXAMPLES.md with comprehensive examples
- Updated requirements.txt with openai-whisper dependency

Supported Languages:
English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian

6505a00a

Add model type filters and update MCP server · 1c01f5b7

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Features Added:
- Model type filters: --t2i-only, --v2v-only, --v2i-only, --3d-only, --tts-only, --audio-only
- Enhanced model list table with new capability columns (V2V, V2I, 3D, TTS)
- Updated detect_model_type() to detect all model capabilities

MCP Server Updates:
- Added videogen_video_to_video tool for V2V style transfer
- Added videogen_apply_video_filter tool for video filters
- Added videogen_extract_frames tool for frame extraction
- Added videogen_create_collage tool for thumbnail grids
- Added videogen_upscale_video tool for AI upscaling
- Added videogen_convert_3d tool for 2D-to-3D conversion
- Added videogen_concat_videos tool for video concatenation
- Updated model list filter to support all new types

SKILL.md Updates:
- Added V2V, V2I, 3D to generation types table
- Added model filter examples
- Added 8 new use cases for V2V, filters, frames, collage, upscale, 3D, concat

1c01f5b7

Add V2V, V2I, 2D-to-3D conversion, and cluster documentation · e69c2d81

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Features Added:
- Video-to-Video (V2V): Style transfer, filters, concatenation
- Video-to-Image (V2I): Frame extraction, keyframes, collages
- 2D-to-3D Conversion: SBS, anaglyph, VR 360 formats
- Video upscaling with AI (ESRGAN, Real-ESRGAN, SwinIR)
- Video filters (grayscale, sepia, blur, speed, slow-mo, etc.)

Command-line Arguments:
- --video: Input video file for V2V/V2I operations
- --video-to-video: Enable V2V style transfer
- --video-filter: Apply video filters
- --extract-frame, --extract-keyframes, --extract-frames
- --convert-3d-sbs, --convert-3d-anaglyph, --convert-vr
- --upscale-video, --upscale-method

Model Discovery:
- Added depth estimation models to --update-models
- Added 2D-to-3D model searches
- Added V2V style transfer models

Documentation:
- Updated README.md with new features
- Added comprehensive V2V/V2I/2D-to-3D examples
- Added multi-node cluster setup guide
- Added NFS shared storage configuration

e69c2d81

Add V2V (Video-to-Video), V2I (Video-to-Image), and video processing features · 6f862e60

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add video frame extraction (extract_video_frames, extract_keyframes)
- Add video info retrieval (get_video_info)
- Add frames to video conversion (frames_to_video)
- Add video upscaling with AI support (upscale_video)
- Add video-to-video style transfer (video_to_video_style_transfer)
- Add video-to-image extraction (video_to_image)
- Add video collage creation (create_video_collage)
- Add video filters (apply_video_filter - grayscale, sepia, blur, etc.)
- Add video concatenation (concat_videos)
- Add image upscaling (upscale_image)

Features:
- Extract frames at specific FPS or timestamps
- AI upscaling with ESRGAN/SwinIR support
- Scene detection for keyframe extraction
- Multiple video filters and effects
- Video concatenation with re-encoding or stream copy

6f862e60

Add character consistency features: IP-Adapter, InstantID, Character Profiles, LoRA Training · b0d20d0b

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add IP-Adapter integration for character consistency using reference images
- Add InstantID support for superior face identity preservation
- Add Character Profile System to store reference images and face embeddings
- Add LoRA Training Workflow for perfect character consistency
- Add command-line arguments for all character consistency features
- Update EXAMPLES.md with comprehensive character consistency documentation
- Update requirements.txt with optional dependencies (insightface, onnxruntime)

New commands:
- --character: Use saved character profile
- --create-character: Create new character profile from reference images
- --list-characters: List all saved profiles
- --show-character: Show profile details
- --ipadapter: Enable IP-Adapter for consistency
- --instantid: Enable InstantID for face identity
- --train-lora: Train custom LoRA for character

b0d20d0b

Validate base model exists before adding LoRA to model list · 84d460f6

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- When --update-models detects a LoRA adapter, validate that the base
  model exists on HuggingFace before adding it to the model list
- Skip LoRAs whose base models are not found on HuggingFace
- Added support for flux and sdxl base model detection
- Print informative messages when skipping LoRAs with missing base models

84d460f6

Fix: Add peft to requirements.txt for LoRA adapter support · 2e8b5bc7

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

PEFT (Parameter-Efficient Fine-Tuning) is required for loading LoRA
adapters with pipe.load_lora_weights(). Without it, LoRA loading fails
with: 'PEFT backend is required for this method.'

2e8b5bc7

Feat: Update models.json when pipeline mismatch is detected and corrected · 2b570a0a

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add update_model_pipeline_class() function to update model config
- Call function when main model pipeline mismatch is corrected
- Call function when image model pipeline mismatch is corrected
- Ensures future runs use the correct pipeline class automatically

2b570a0a

Fix: Add pipeline component mismatch fallback for image model loading in I2V mode · b22f0730
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

b22f0730

24 Feb, 2026 16 commits

Fix: Wrap LoRA loading and offloading in defer_i2v_loading check · 0a1f210c

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

When defer_i2v_loading=True (I2V mode without provided image), the code
sets pipe=None but then tried to call pipe.load_lora_weights() and
pipe.enable_model_cpu_offload() on None, causing AttributeError.

This fix wraps the LoRA loading and offloading configuration blocks
inside an 'if not defer_i2v_loading:' condition so they are skipped
when the I2V model loading is deferred until after image generation.

0a1f210c

Fix OOM in I2V mode: sequential model loading · 0ccc1d52

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Defer I2V model loading when in I2V mode without provided image
- Generate image first with T2I model
- Unload T2I model completely (del, empty_cache, gc.collect)
- Then load I2V model and generate video
- This ensures only one model is in memory at a time
- Fixes Linux OOM killer issue when loading multiple models

0ccc1d52

Add auto-disable feature for models that fail 3 times in --auto mode · 1c242c7e

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Add auto_disable.json to track failure counts and disabled status
- Models that fail 3 times in auto mode are automatically disabled
- Disabled models are skipped during auto model selection
- Manual selection of a disabled model re-enables it for auto mode
- Model list now shows 'Auto' column with status (Yes, OFF, or X/3)
- Disabled models shown with 🚫 indicator in model list
- New functions: load_auto_disable_data(), save_auto_disable_data(),
  record_model_failure(), is_model_disabled(), re_enable_model(),
  get_model_fail_count()

1c242c7e

Fix image generation to handle LoRA adapters - load base model first, then apply LoRA weights · b1e602e5
Stefy Lanza (nextime / spora ) authored Feb 24, 2026

b1e602e5

Add LoRA adapter detection and base model extraction from HuggingFace tags · ad95e206

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Detect LoRA adapters from tags (lora, LoRA) and files (*.safetensors)
- Extract base model from tags (format: base_model:org/model-name)
- Skip model_index.json fetch for LoRA-only repos
- Determine pipeline class from base model for LoRA adapters
- Improves handling of models like enhanceaiteam/Flux-Uncensored-V2

ad95e206

Fix pipeline fallback error handling - nest error messages inside failure check · 33ec35a2
Stefy Lanza (nextime / spora ) authored Feb 24, 2026

33ec35a2

Fix pipeline component mismatch fallback - use boolean flag instead of locals() · efa4dfd3

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Replace locals().get('goto_after_loading', False) with properly initialized boolean flag
- The locals() approach failed because locals() returns a copy, not a reference
- Now the fallback correctly skips error handling when pipeline loads successfully via detected class

efa4dfd3

Fix pipeline component mismatch fallback and indentation · 5a37d9d2

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Add fallback mechanism for models with incorrect model_index.json
- Detect pipeline class from model ID patterns when component mismatch occurs
- Fix indentation error in auto mode retry logic block
- Properly handle Wan2.2-I2V models with misconfigured pipeline class

5a37d9d2

Preserve user-specified model in auto mode retry logic · 4668132a

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Track if user explicitly specified --model before auto mode runs
- Skip retry with alternative models when user's model fails
- Show clear error message explaining user's choice is preserved
- Only auto-selected models can be retried with alternatives

4668132a

Fix retry logic to skip LoRAs with failed base models · be1e5b9d

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Track failed base models in _failed_base_models set
- Skip LoRA adapters that depend on failed base models during retry
- Try non-LoRA alternatives when all LoRAs with same base fail
- Improve error detection for 'Repository Not Found' errors
- Show skipped LoRA count during retry process

be1e5b9d

Add --retry argument to control model retry attempts on failure · 14ae5bdc
Stefy Lanza (nextime / spora ) authored Feb 24, 2026

14ae5bdc

Improve model discovery: skip unfound models, add deep search for variants · 2fe62c6f

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Skip models not found on HuggingFace instead of adding with defaults
- Add deep search for model variants from known organizations
- Search organizations: Alpha-VLLM, stepvideo, hpcai-tech, tencent,
  rhymes-ai, THUDM, genmo, Wan-AI, stabilityai, black-forest-labs
- Remove non-existent models from known_large_models list
- Better error handling for model validation

2fe62c6f

Add HF_TOKEN authentication support for gated/private models · 4164da7e

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Add HF_TOKEN support to main pipeline loading (pipe_kwargs)
- Add HF_TOKEN support to VAE loading for Wan models
- Add HF_TOKEN support to image model loading for I2V mode
- Enhanced pipeline detection with multiple strategies
- Improved error messages for authentication errors (401, gated models)
- Added debug output for HF token status

4164da7e

Fix auto mode retry logic and improve error handling · bcbae548

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Fix retry logic bug: only run auto mode once (check for _auto_mode flag)
- Prevent infinite retry loops by preserving retry count across recursive calls
- Add better error handling for pipeline compatibility issues (FrozenDict, scale_factor errors)
- Add helpful troubleshooting messages for diffusers version incompatibilities
- Show retry exhaustion message when all alternative models fail

bcbae548

Add DiffusionPipeline support and auto mode retry logic · 83ea5872

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

- Add DiffusionPipeline to PIPELINE_CLASS_MAP for generic model loading
- Add fallback to DiffusionPipeline for unknown pipeline classes
- Add return_all parameter to select_best_model() for getting all candidates
- Store alternative models in auto mode for retry support
- Implement retry logic when model loading fails in auto mode
- Retry up to 3 times with alternative models before failing
- Add debug output for model loading troubleshooting
- Improve error messages with troubleshooting hints

83ea5872

Add audio generation, auto mode, MCP server, and comprehensive documentation · 4ba0b99f

Stefy Lanza (nextime / spora ) authored Feb 24, 2026

Features:
- Audio generation: TTS via Bark/Edge-TTS, music via MusicGen
- Audio sync: stretch, trim, pad, loop modes
- Lip sync: Wav2Lip and SadTalker integration
- Auto mode: automatic model selection with NSFW detection
- MCP server: AI agent integration via Model Context Protocol
- Model management: external config, search, validation
- T2I/I2I support: static image and image-to-image generation
- Time estimation: detailed timing breakdown for each step

Documentation:
- README.md: comprehensive installation and usage guide
- EXAMPLES.md: 100+ command-line examples
- SKILL.md: AI agent integration guide
- LICENSE.md: GPLv3 license

Copyleft © 2026 Stefy <stefy@nexlab.net>

4ba0b99f