Commits · b6d177d8f78bf1bcbe5c911c636c44f4b3dc775c · nexlab / videogen

25 Feb, 2026 40 commits

Fix model selection for I2V generation by checking capabilities before scoring · b6d177d8
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

b6d177d8
Add debug script to analyze model selection · 0a0e2084
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

0a0e2084
Add --allow-bigger-models option to web interface and MCP server · ad1c700b
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

ad1c700b
Document --allow-bigger-models option · e0c0485f
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

e0c0485f
Add --allow-bigger-models option to allow models larger than VRAM using system RAM · 81008fd6
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

81008fd6
Adjust VRAM checking to allow models up to 10% less than available VRAM, or... · 1774f810
Stefy Lanza (nextime / spora ) authored Feb 25, 2026
```
Adjust VRAM checking to allow models up to 10% less than available VRAM, or full VRAM with offload strategy
```
1774f810
Fix auto mode model selection issues · d3d67441
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

d3d67441
Add __pycache__ to gitignore and remove discord deb file · 0a02552e
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

0a02552e
Add model management and cache control features · b8e1a63e
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

b8e1a63e
Fix audio prompt handling in web interface - use music_prompt for music generation · 48215e18
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

48215e18

Fix component-only model loading to use correct pipeline for I2V mode · 4500759b

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- When loading LTX-Video base model in I2V mode, use LTXImageToVideoPipeline
- When loading LTX-Video base model in T2V mode, use LTXPipeline
- Update PipelineClass after loading base pipeline to match the actual class used
- This fixes the 'LTXPipeline.__call__() got an unexpected keyword argument image' error

4500759b

Add LTXImageToVideoPipeline support for I2V mode · 202861ad

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add LTXImageToVideoPipeline to PIPELINE_CLASS_MAP as I2V type
- Change LTXPipeline to T2V type (it doesn't support image input)
- Auto-switch from LTXPipeline to LTXImageToVideoPipeline when in I2V mode
- Update I2V check to only verify pipeline class name (removed supports_i2v flag check)
- This allows LTX-Video models to work in I2V mode with image input

202861ad

Fix I2V check to verify pipeline class supports image input · 713d42ba

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- LTXPipeline is T2V-only, not I2V-capable
- Check both supports_i2v flag AND pipeline class name
- I2V-capable pipelines: StableVideoDiffusionPipeline, I2VGenXLPipeline, LTXImageToVideoPipeline, WanPipeline
- Prevents passing 'image' argument to T2V-only pipelines like LTXPipeline

713d42ba

Fix web interface model select to filter by mode · 0af33f9c

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- In T2I mode, only show image models in the main model select
- In other modes (T2V, I2V, etc.), show video models
- Repopulate model select when switching modes
- Image model select for I2V mode always shows image models

0af33f9c

Fix syntax error - remove orphaned except block and fix indentation · e1a47955

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Remove orphaned 'except Exception as e:' block that had no matching try
- Fix indentation for wan scheduler configuration
- Change 'I2V model loaded' to generic 'Model loaded' message

e1a47955

Fix component-only model loading for LTX-Video fine-tunes · 05831edb

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Set pipeline_loaded_successfully=True when component loading succeeds
- Fix indentation for LoRA and offloading code blocks
- Define 'off' variable inside the correct scope
- This fixes loading models like Muinez/ltxvideo-2b-nsfw which are
  fine-tuned transformer weights without a full pipeline

05831edb

Add sentencepiece and protobuf dependencies for LTX-Video tokenizer · 4bdfc1ed

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- sentencepiece>=0.2.0 required for LTX-Video tokenizer
- protobuf>=5.27.0 required by sentencepiece
- These are needed to parse the spiece.model tokenizer file

4bdfc1ed

Fix time estimation for T2I models and suppress log output for JSON · 0a6b9dd2

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Move T2I detection before time estimation
- Add has_t2i parameter to estimate_total_time
- T2I models now show image_generation time instead of video_generation
- Add Lumina pipelines to T2I model detection
- Suppress 'Loaded models' message when --json flag is used

0a6b9dd2

Fix JSON output for model list - output JSON before any other text · 6bd2bbd4

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Move JSON output to happen before table printing
- Add better error handling in webapp.py for debugging
- This fixes the web interface model list not showing

6bd2bbd4

Add --remove-model option to remove models from local database · 53b999cb

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Can remove by numeric ID (from --model-list) or by name
- Also supports removing by HuggingFace model ID
- Updates models.json after removal

53b999cb

Add Lumina pipeline support (LuminaText2ImgPipeline, Lumina2Text2ImgPipeline) · 3d576732

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add Lumina pipeline classes to PIPELINE_CLASS_MAP
- Fix Lumina model detection in detect_pipeline_class
- Remove non-existent LuminaVideoPipeline references
- Update Alpha-VLLM/Lumina-Next-SFT to use correct pipeline class
- Lumina-Next-SFT is a T2I model, not T2V (~20-30GB VRAM)

3d576732

Add --json flag for model list output (for web interface) · 74900219

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add --json argument to output model list in JSON format
- Include model capabilities, disabled status, and fail count in JSON output
- This fixes the web interface model list not showing models

74900219

Add better error handling for tokenizer/cache errors in base model loading · 4e2361f9

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Detect tokenizer parsing errors and provide helpful cache clearing instructions
- Add retry logic for corrupted cache files
- Improve error messages for component-only model loading

4e2361f9

Fix LTX pipeline class name: use LTXPipeline instead of LTXVideoPipeline · f0b663fa

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- The correct class name in diffusers is LTXPipeline, not LTXVideoPipeline
- Updated PIPELINE_CLASS_MAP and detect_pipeline_class
- Updated all references throughout the codebase
- This fixes loading models like Muinez/ltxvideo-2b-nsfw

f0b663fa

Add logo to web interface header · 49cca317
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

49cca317
Add VideoGen logo and include in README · 2be9d4da
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

2be9d4da

Fix I2V model loading: use correct pipeline class for base models · b75b07e6

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- When loading fine-tuned component models (like LTXVideoTransformer3DModel),
  use the correct pipeline class for the base model instead of the configured
  PipelineClass which may be wrong
- Add proper pipeline class detection for LTX, Wan, SVD, CogVideo, Mochi
- This fixes loading models like Muinez/ltxvideo-2b-nsfw which have
  config.json only (no model_index.json)

b75b07e6

Add web interface screenshot to README · 6482f2ac
Stefy Lanza (nextime / spora ) authored Feb 25, 2026

6482f2ac

Add character consistency features, fix model loading for non-diffusers models · 1f5226ed

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Add character profile management (create, list, show, delete)
- Add IP-Adapter and InstantID support for character consistency
- Fix model loading for models with config.json only (no model_index.json)
- Add component-only model detection (fine-tuned weights)
- Update MCP server with character consistency tools
- Update SKILL.md and README.md documentation
- Add memory management for dubbing/translation
- Add chunked processing for Whisper transcription
- Add character persistency options to web interface

1f5226ed

Fix web interface: show image upload for I2V mode · 627eb38f

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Users can now upload their own image for I2V mode instead of
only being able to generate one. The image upload box is now
visible in I2V mode, allowing users to either:
- Upload an existing image to animate
- Or let the system generate an image first

This provides more flexibility for the I2V workflow.

627eb38f

Add style detection and model matching for auto mode · ae596279

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Style Detection (detect_generation_type):
- Detects 9 artistic styles: anime, photorealistic, digital_art, cgi, cartoon, fantasy, traditional, scifi, horror
- Extracts style keywords from prompts for matching
- Returns style info in generation type dict

Style Matching (select_best_model):
- Matches LoRA adapters to requested style (+60 bonus for style match)
- Matches base models to requested style (+50 bonus for style match)
- Checks model name, ID, and tags for style indicators
- Examples:
  - 'anime girl' → selects anime-optimized models/LoRAs
  - 'photorealistic portrait' → selects realism models
  - 'cyberpunk city' → selects sci-fi models/LoRAs

This allows --auto mode to intelligently select models based on
the artistic style requested in the prompt.

ae596279

Fix webapp.py for Python 3.13+ compatibility · 2bdf2e8a

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Remove deprecated eventlet dependency
- Use threading mode for Flask-SocketIO instead of eventlet
- eventlet is deprecated and has compatibility issues with Python 3.13
- Threading mode works reliably on all Python versions

This fixes the RuntimeError: Working outside of application context
errors when running webapp.py on Python 3.13.

2bdf2e8a

Add transformers backend for MusicGen (Python 3.13+ compatible) · 86fcdc95

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

The generate_music() function now supports two backends:

1. audiocraft (preferred):
   - Original MusicGen implementation
   - Works on Python 3.12 and lower
   - Falls back to transformers if not available

2. transformers (fallback):
   - Uses HuggingFace transformers library
   - Works on Python 3.13+
   - No spacy/blis dependency issues

The function automatically:
- Tries audiocraft first (if available)
- Falls back to transformers if audiocraft fails or is not installed
- Provides clear error messages if neither backend is available

This allows MusicGen music generation to work on Python 3.13 without
the problematic audiocraft → spacy → thinc → blis dependency chain.

86fcdc95

Document audiocraft incompatibility with Python 3.13 · f28cca78

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

audiocraft (MusicGen) is NOT compatible with Python 3.13 due to:
- audiocraft → spacy → thinc → blis
- blis fails to compile with GCC errors: unrecognized '-mavx512pf' option
- This is a known issue with blis and newer GCC/Python versions

Updated requirements.txt to:
- Remove audiocraft from direct dependencies
- Add note about Python 3.13 incompatibility
- Suggest using Python 3.12 or lower for audiocraft
- Or use a separate Python 3.12 environment for music generation

f28cca78

Add system dependencies for Debian/Ubuntu to requirements.txt · a4fda7b0

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Added comprehensive system dependencies section for Debian/Ubuntu:

Required system packages:
- build-essential, cmake, pkg-config (build tools)
- ffmpeg (video processing)
- libavformat-dev, libavcodec-dev, libavdevice-dev, libavutil-dev
- libavfilter-dev, libswscale-dev, libswresample-dev (FFmpeg dev libs)
- libsdl2-dev, libssl-dev, libcurl4-openssl-dev
- python3-dev

These are required for:
- PyAV (av package) - needed by audiocraft/MusicGen
- face-recognition (dlib)
- Building Python extensions

Updated quick install instructions to include system dependencies step.

a4fda7b0

Update requirements.txt for Python 3.12+ compatibility · b0d43691

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Updated all package versions to be compatible with Python 3.12 and 3.13:

Core Dependencies:
- torch>=2.2.0 (was 2.0.0)
- torchvision>=0.17.0 (was 0.15.0)
- torchaudio>=2.2.0 (was 2.0.0)
- diffusers>=0.32.0 (was 0.30.0)
- transformers>=4.40.0 (was 4.35.0)
- accelerate>=0.27.0 (was 0.24.0)
- xformers>=0.0.25 (was 0.0.22)
- spandrel>=0.2.0 (was 0.1.0)
- ftfy>=6.2.0 (was 6.1.0)
- Pillow>=10.2.0 (was 10.0.0)
- safetensors>=0.4.2 (was 0.4.0)
- huggingface-hub>=0.23.0 (was 0.19.0)
- peft>=0.10.0 (was 0.7.0)
- numpy>=1.26.0 (added for Python 3.12+ compatibility)

Audio Dependencies:
- scipy>=1.12.0 (was 1.11.0)
- librosa>=0.10.2 (was 0.10.0)
- edge-tts>=6.1.10 (was 6.1.0)

Web Interface:
- flask>=3.0.2 (was 3.0.0)
- flask-socketio>=5.3.6 (was 5.3.0)
- eventlet>=0.36.0 (was 0.33.0)
- python-socketio>=5.11.0 (was 5.10.0)
- werkzeug>=3.0.1 (was 3.0.0)

Added detailed installation notes for Python 3.12+ including:
- PyTorch nightly installation for CUDA
- xformers --pre flag for Python 3.13
- Git installation for diffusers/transformers
- Quick install commands

b0d43691

Fix loading transformer-only fine-tuned models (like Muinez/ltxvideo-2b-nsfw) · 03b62189

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

Some models on HuggingFace are not full pipelines but just fine-tuned components
(e.g., just the transformer weights). These have a config.json at root level with
_class_name pointing to a component class like 'LTXVideoTransformer3DModel'.

This fix adds:

1. Detection of component-only models:
   - Check for config.json at root level
   - Read _class_name to determine component type
   - Detect if it's a transformer, VAE, or other component

2. Proper loading strategy:
   - Load the base pipeline first (e.g., Lightricks/LTX-Video)
   - Then load the fine-tuned component from the model repo
   - Replace the base component with the fine-tuned one

3. Supported component classes:
   - LTXVideoTransformer3DModel → Lightricks/LTX-Video
   - AutoencoderKLLTXVideo → Lightricks/LTX-Video
   - UNet2DConditionModel, UNet3DConditionModel, AutoencoderKL

This allows loading models like Muinez/ltxvideo-2b-nsfw which are
fine-tuned transformer weights without a full pipeline structure.

03b62189

Fix loading models without model_index.json (I2V models) · c5cdb9fd

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

When a model has component folders (transformer, vae, etc.) but no model_index.json
at the root level, the loading would fail. This fix adds:

1. Base model fallback strategy:
   - Detect model type from model ID (ltx, wan, svd, cogvideo, mochi)
   - Load the known base model first
   - Then attempt to load fine-tuned components from the target model

2. Component detection and loading:
   - List files in the repo to find component folders
   - Load transformer, VAE components from the fine-tuned model
   - Fall back to base model if component loading fails

3. Better error messages:
   - Clear indication of what went wrong
   - Suggestions for alternative models

This fixes loading of models like Muinez/ltxvideo-2b-nsfw which have
all component folders but are missing the model_index.json file.

c5cdb9fd

Add system load detection and more conservative time estimates · ebf80ab6

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

System Load Detection:
- Added get_system_load() method to detect CPU, memory, and GPU utilization
- CPU load >80% adds 50% slowdown, >50% adds 20% slowdown
- Memory >90% adds 80% slowdown, >75% adds 40% slowdown
- GPU utilization >80% adds 60% slowdown, >50% adds 30% slowdown
- Warning displayed when system is under heavy load

More Conservative Base Estimates:
- WanPipeline: 3.0s → 5.0s/frame
- MochiPipeline: 5.0s → 8.0s/frame
- SVD: 1.5s → 2.5s/frame
- CogVideoX: 4.0s → 6.0s/frame
- LTXVideo: 4.0s → 6.0s/frame
- Flux: 8.0s → 12.0s/frame
- Allegro: 8.0s → 12.0s/frame
- Hunyuan: 10.0s → 15.0s/frame
- OpenSora: 6.0s → 10.0s/frame

More Conservative GPU Tier Multipliers:
- extreme: 1.0 → 1.2x
- high: 1.5 → 2.0x
- medium: 2.5 → 3.5x
- low: 4.0 → 5.0x
- very_low: 8.0 → 10.0x

More Conservative Model Loading Times:
- Huge (>50GB): 10min → 15min
- Large (30-50GB): 5min → 8min
- Medium (16-30GB): 3min → 5min
- Small (<16GB): 1.5min → 3min
- Download estimate: 15s/GB → 30s/GB

Additional Safety Margins:
- Overhead increased from 30% to 50%
- I2V processing overhead increased from 20% to 30%
- Added 20% safety margin for unpredictable factors
- Load factor applied to model loading time as well

ebf80ab6

Fix time estimation to be more realistic · 8c48cea3

Stefy Lanza (nextime / spora ) authored Feb 25, 2026

- Increased base time per frame for all models (2-4x more realistic)
- Added LTXVideoPipeline specific estimate (4.0s/frame)
- Increased model loading times (90s-10min based on model size)
- Added realistic image model loading times for I2V mode
- Added image generation time based on model type (Flux, SDXL, SD3)
- Added 30% overhead for I/O and memory operations
- Added 20% extra time for I2V processing
- Increased resolution scaling factor to 1.3 (quadratic relationship)
- Increased download time estimate to 15s/GB with 2min cap

The previous estimates were too optimistic and didn't account for:
- Full diffusion process (multiple denoising steps)
- Model loading from disk/download
- Memory management overhead
- I2V-specific processing time
- Image model loading for I2V mode

8c48cea3