1. 25 Feb, 2026 18 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Add system dependencies for Debian/Ubuntu to requirements.txt · a4fda7b0
      Stefy Lanza (nextime / spora ) authored
      Added comprehensive system dependencies section for Debian/Ubuntu:
      
      Required system packages:
      - build-essential, cmake, pkg-config (build tools)
      - ffmpeg (video processing)
      - libavformat-dev, libavcodec-dev, libavdevice-dev, libavutil-dev
      - libavfilter-dev, libswscale-dev, libswresample-dev (FFmpeg dev libs)
      - libsdl2-dev, libssl-dev, libcurl4-openssl-dev
      - python3-dev
      
      These are required for:
      - PyAV (av package) - needed by audiocraft/MusicGen
      - face-recognition (dlib)
      - Building Python extensions
      
      Updated quick install instructions to include system dependencies step.
      a4fda7b0
    • Stefy Lanza (nextime / spora )'s avatar
      Update requirements.txt for Python 3.12+ compatibility · b0d43691
      Stefy Lanza (nextime / spora ) authored
      Updated all package versions to be compatible with Python 3.12 and 3.13:
      
      Core Dependencies:
      - torch>=2.2.0 (was 2.0.0)
      - torchvision>=0.17.0 (was 0.15.0)
      - torchaudio>=2.2.0 (was 2.0.0)
      - diffusers>=0.32.0 (was 0.30.0)
      - transformers>=4.40.0 (was 4.35.0)
      - accelerate>=0.27.0 (was 0.24.0)
      - xformers>=0.0.25 (was 0.0.22)
      - spandrel>=0.2.0 (was 0.1.0)
      - ftfy>=6.2.0 (was 6.1.0)
      - Pillow>=10.2.0 (was 10.0.0)
      - safetensors>=0.4.2 (was 0.4.0)
      - huggingface-hub>=0.23.0 (was 0.19.0)
      - peft>=0.10.0 (was 0.7.0)
      - numpy>=1.26.0 (added for Python 3.12+ compatibility)
      
      Audio Dependencies:
      - scipy>=1.12.0 (was 1.11.0)
      - librosa>=0.10.2 (was 0.10.0)
      - edge-tts>=6.1.10 (was 6.1.0)
      
      Web Interface:
      - flask>=3.0.2 (was 3.0.0)
      - flask-socketio>=5.3.6 (was 5.3.0)
      - eventlet>=0.36.0 (was 0.33.0)
      - python-socketio>=5.11.0 (was 5.10.0)
      - werkzeug>=3.0.1 (was 3.0.0)
      
      Added detailed installation notes for Python 3.12+ including:
      - PyTorch nightly installation for CUDA
      - xformers --pre flag for Python 3.13
      - Git installation for diffusers/transformers
      - Quick install commands
      b0d43691
    • Stefy Lanza (nextime / spora )'s avatar
      Fix loading transformer-only fine-tuned models (like Muinez/ltxvideo-2b-nsfw) · 03b62189
      Stefy Lanza (nextime / spora ) authored
      Some models on HuggingFace are not full pipelines but just fine-tuned components
      (e.g., just the transformer weights). These have a config.json at root level with
      _class_name pointing to a component class like 'LTXVideoTransformer3DModel'.
      
      This fix adds:
      
      1. Detection of component-only models:
         - Check for config.json at root level
         - Read _class_name to determine component type
         - Detect if it's a transformer, VAE, or other component
      
      2. Proper loading strategy:
         - Load the base pipeline first (e.g., Lightricks/LTX-Video)
         - Then load the fine-tuned component from the model repo
         - Replace the base component with the fine-tuned one
      
      3. Supported component classes:
         - LTXVideoTransformer3DModel → Lightricks/LTX-Video
         - AutoencoderKLLTXVideo → Lightricks/LTX-Video
         - UNet2DConditionModel, UNet3DConditionModel, AutoencoderKL
      
      This allows loading models like Muinez/ltxvideo-2b-nsfw which are
      fine-tuned transformer weights without a full pipeline structure.
      03b62189
    • Stefy Lanza (nextime / spora )'s avatar
      Fix loading models without model_index.json (I2V models) · c5cdb9fd
      Stefy Lanza (nextime / spora ) authored
      When a model has component folders (transformer, vae, etc.) but no model_index.json
      at the root level, the loading would fail. This fix adds:
      
      1. Base model fallback strategy:
         - Detect model type from model ID (ltx, wan, svd, cogvideo, mochi)
         - Load the known base model first
         - Then attempt to load fine-tuned components from the target model
      
      2. Component detection and loading:
         - List files in the repo to find component folders
         - Load transformer, VAE components from the fine-tuned model
         - Fall back to base model if component loading fails
      
      3. Better error messages:
         - Clear indication of what went wrong
         - Suggestions for alternative models
      
      This fixes loading of models like Muinez/ltxvideo-2b-nsfw which have
      all component folders but are missing the model_index.json file.
      c5cdb9fd
    • Stefy Lanza (nextime / spora )'s avatar
      Add system load detection and more conservative time estimates · ebf80ab6
      Stefy Lanza (nextime / spora ) authored
      System Load Detection:
      - Added get_system_load() method to detect CPU, memory, and GPU utilization
      - CPU load >80% adds 50% slowdown, >50% adds 20% slowdown
      - Memory >90% adds 80% slowdown, >75% adds 40% slowdown
      - GPU utilization >80% adds 60% slowdown, >50% adds 30% slowdown
      - Warning displayed when system is under heavy load
      
      More Conservative Base Estimates:
      - WanPipeline: 3.0s → 5.0s/frame
      - MochiPipeline: 5.0s → 8.0s/frame
      - SVD: 1.5s → 2.5s/frame
      - CogVideoX: 4.0s → 6.0s/frame
      - LTXVideo: 4.0s → 6.0s/frame
      - Flux: 8.0s → 12.0s/frame
      - Allegro: 8.0s → 12.0s/frame
      - Hunyuan: 10.0s → 15.0s/frame
      - OpenSora: 6.0s → 10.0s/frame
      
      More Conservative GPU Tier Multipliers:
      - extreme: 1.0 → 1.2x
      - high: 1.5 → 2.0x
      - medium: 2.5 → 3.5x
      - low: 4.0 → 5.0x
      - very_low: 8.0 → 10.0x
      
      More Conservative Model Loading Times:
      - Huge (>50GB): 10min → 15min
      - Large (30-50GB): 5min → 8min
      - Medium (16-30GB): 3min → 5min
      - Small (<16GB): 1.5min → 3min
      - Download estimate: 15s/GB → 30s/GB
      
      Additional Safety Margins:
      - Overhead increased from 30% to 50%
      - I2V processing overhead increased from 20% to 30%
      - Added 20% safety margin for unpredictable factors
      - Load factor applied to model loading time as well
      ebf80ab6
    • Stefy Lanza (nextime / spora )'s avatar
      Fix time estimation to be more realistic · 8c48cea3
      Stefy Lanza (nextime / spora ) authored
      - Increased base time per frame for all models (2-4x more realistic)
      - Added LTXVideoPipeline specific estimate (4.0s/frame)
      - Increased model loading times (90s-10min based on model size)
      - Added realistic image model loading times for I2V mode
      - Added image generation time based on model type (Flux, SDXL, SD3)
      - Added 30% overhead for I/O and memory operations
      - Added 20% extra time for I2V processing
      - Increased resolution scaling factor to 1.3 (quadratic relationship)
      - Increased download time estimate to 15s/GB with 2min cap
      
      The previous estimates were too optimistic and didn't account for:
      - Full diffusion process (multiple denoising steps)
      - Model loading from disk/download
      - Memory management overhead
      - I2V-specific processing time
      - Image model loading for I2V mode
      8c48cea3
    • Stefy Lanza (nextime / spora )'s avatar
      Add web interface for VideoGen · 5291deb2
      Stefy Lanza (nextime / spora ) authored
      Features:
      - Modern web UI with all generation modes (T2V, I2V, T2I, I2I, V2V, Dub, Subtitles, Upscale)
      - Real-time progress updates via WebSocket
      - File upload for input images/videos/audio
      - File download for generated content
      - Background job processing with progress tracking
      - Job management (cancel, retry, delete)
      - Gallery for browsing generated files
      - REST API for programmatic access
      - Responsive design for desktop and mobile
      
      Backend (webapp.py):
      - Flask + Flask-SocketIO for real-time updates
      - Background job processing with threading
      - File upload/download handling
      - Job state persistence
      - REST API endpoints
      
      Frontend:
      - Modern dark theme UI
      - Mode selection with visual cards
      - Form with all options and settings
      - Real-time progress modal with log streaming
      - Toast notifications
      - Keyboard shortcuts (Ctrl+Enter to submit, Escape to close)
      
      Documentation:
      - Updated README.md with web interface section
      - Updated EXAMPLES.md with web interface usage
      - Updated requirements.txt with web dependencies
      5291deb2
    • Stefy Lanza (nextime / spora )'s avatar
      Add 404 fallback to deferred I2V model loading · 344cd12a
      Stefy Lanza (nextime / spora ) authored
      - Apply same 404 fallback strategy to deferred I2V model loading
      - Try DiffusionPipeline as fallback when model_index.json not found
      - Ensures all model loading paths have consistent error handling
      344cd12a
    • Stefy Lanza (nextime / spora )'s avatar
      Fix model loading 404 errors and improve time estimation · c2c62b60
      Stefy Lanza (nextime / spora ) authored
      Model Loading Fixes:
      - Add fallback loading when model_index.json returns 404
      - Try alternative paths (diffusers/, diffusion_model/, pipeline/)
      - Try generic DiffusionPipeline as fallback
      - Check HuggingFace API for actual file structure
      - Load from subdirectories if model_index.json found there
      - Apply same fallback to I2V image model loading
      
      Time Estimation Improvements:
      - Add hardware detection (GPU model, VRAM, RAM, CPU cores)
      - Detect GPU tier (extreme/high/medium/low/very_low)
      - Calculate realistic time estimates based on GPU performance
      - Account for VRAM constraints and offloading penalty
      - Consider distributed/multi-GPU setups
      - More accurate model loading times (minutes, not seconds)
      - Account for resolution impact (quadratic relationship)
      - Add 20% overhead for memory management
      - Print hardware info for transparency
      
      GPU Tier Performance Multipliers:
      - Extreme (RTX 4090, A100, H100): 1.0x
      - High (RTX 4080, RTX 3090, V100): 1.5x
      - Medium (RTX 4070, RTX 3080, T4): 2.5x
      - Low (RTX 3060, RTX 2070): 4.0x
      - Very Low (GTX 1060, etc.): 8.0x
      c2c62b60
    • Stefy Lanza (nextime / spora )'s avatar
      Add video dubbing, translation, and subtitle features · 6505a00a
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Video dubbing with voice preservation (--dub-video)
      - Automatic subtitle generation (--create-subtitles)
      - Subtitle translation (--translate-subtitles)
      - Burn subtitles into video (--burn-subtitles)
      - Audio transcription using Whisper (--transcribe)
      - Text translation using MarianMT models
      
      New Command-Line Arguments:
      - --transcribe: Transcribe audio from video
      - --whisper-model: Select Whisper model size (tiny/base/small/medium/large)
      - --source-lang: Source language code
      - --target-lang: Target language code for translation
      - --create-subtitles: Create SRT subtitles from video
      - --translate-subtitles: Translate subtitles to target language
      - --burn-subtitles: Burn subtitles into video
      - --subtitle-style: Customize subtitle appearance
      - --dub-video: Translate and dub video with voice preservation
      - --voice-clone/--no-voice-clone: Enable/disable voice cloning
      
      MCP Server Updates:
      - Added videogen_transcribe_video tool
      - Added videogen_create_subtitles tool
      - Added videogen_dub_video tool
      - Added videogen_translate_text tool
      
      Documentation Updates:
      - Updated SKILL.md with dubbing/translation section
      - Updated EXAMPLES.md with comprehensive examples
      - Updated requirements.txt with openai-whisper dependency
      
      Supported Languages:
      English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian
      6505a00a
    • Stefy Lanza (nextime / spora )'s avatar
      Add model type filters and update MCP server · 1c01f5b7
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Model type filters: --t2i-only, --v2v-only, --v2i-only, --3d-only, --tts-only, --audio-only
      - Enhanced model list table with new capability columns (V2V, V2I, 3D, TTS)
      - Updated detect_model_type() to detect all model capabilities
      
      MCP Server Updates:
      - Added videogen_video_to_video tool for V2V style transfer
      - Added videogen_apply_video_filter tool for video filters
      - Added videogen_extract_frames tool for frame extraction
      - Added videogen_create_collage tool for thumbnail grids
      - Added videogen_upscale_video tool for AI upscaling
      - Added videogen_convert_3d tool for 2D-to-3D conversion
      - Added videogen_concat_videos tool for video concatenation
      - Updated model list filter to support all new types
      
      SKILL.md Updates:
      - Added V2V, V2I, 3D to generation types table
      - Added model filter examples
      - Added 8 new use cases for V2V, filters, frames, collage, upscale, 3D, concat
      1c01f5b7
    • Stefy Lanza (nextime / spora )'s avatar
      Add V2V, V2I, 2D-to-3D conversion, and cluster documentation · e69c2d81
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Video-to-Video (V2V): Style transfer, filters, concatenation
      - Video-to-Image (V2I): Frame extraction, keyframes, collages
      - 2D-to-3D Conversion: SBS, anaglyph, VR 360 formats
      - Video upscaling with AI (ESRGAN, Real-ESRGAN, SwinIR)
      - Video filters (grayscale, sepia, blur, speed, slow-mo, etc.)
      
      Command-line Arguments:
      - --video: Input video file for V2V/V2I operations
      - --video-to-video: Enable V2V style transfer
      - --video-filter: Apply video filters
      - --extract-frame, --extract-keyframes, --extract-frames
      - --convert-3d-sbs, --convert-3d-anaglyph, --convert-vr
      - --upscale-video, --upscale-method
      
      Model Discovery:
      - Added depth estimation models to --update-models
      - Added 2D-to-3D model searches
      - Added V2V style transfer models
      
      Documentation:
      - Updated README.md with new features
      - Added comprehensive V2V/V2I/2D-to-3D examples
      - Added multi-node cluster setup guide
      - Added NFS shared storage configuration
      e69c2d81
    • Stefy Lanza (nextime / spora )'s avatar
      Add V2V (Video-to-Video), V2I (Video-to-Image), and video processing features · 6f862e60
      Stefy Lanza (nextime / spora ) authored
      - Add video frame extraction (extract_video_frames, extract_keyframes)
      - Add video info retrieval (get_video_info)
      - Add frames to video conversion (frames_to_video)
      - Add video upscaling with AI support (upscale_video)
      - Add video-to-video style transfer (video_to_video_style_transfer)
      - Add video-to-image extraction (video_to_image)
      - Add video collage creation (create_video_collage)
      - Add video filters (apply_video_filter - grayscale, sepia, blur, etc.)
      - Add video concatenation (concat_videos)
      - Add image upscaling (upscale_image)
      
      Features:
      - Extract frames at specific FPS or timestamps
      - AI upscaling with ESRGAN/SwinIR support
      - Scene detection for keyframe extraction
      - Multiple video filters and effects
      - Video concatenation with re-encoding or stream copy
      6f862e60
    • Stefy Lanza (nextime / spora )'s avatar
      Add character consistency features: IP-Adapter, InstantID, Character Profiles, LoRA Training · b0d20d0b
      Stefy Lanza (nextime / spora ) authored
      - Add IP-Adapter integration for character consistency using reference images
      - Add InstantID support for superior face identity preservation
      - Add Character Profile System to store reference images and face embeddings
      - Add LoRA Training Workflow for perfect character consistency
      - Add command-line arguments for all character consistency features
      - Update EXAMPLES.md with comprehensive character consistency documentation
      - Update requirements.txt with optional dependencies (insightface, onnxruntime)
      
      New commands:
      - --character: Use saved character profile
      - --create-character: Create new character profile from reference images
      - --list-characters: List all saved profiles
      - --show-character: Show profile details
      - --ipadapter: Enable IP-Adapter for consistency
      - --instantid: Enable InstantID for face identity
      - --train-lora: Train custom LoRA for character
      b0d20d0b
    • Stefy Lanza (nextime / spora )'s avatar
      Validate base model exists before adding LoRA to model list · 84d460f6
      Stefy Lanza (nextime / spora ) authored
      - When --update-models detects a LoRA adapter, validate that the base
        model exists on HuggingFace before adding it to the model list
      - Skip LoRAs whose base models are not found on HuggingFace
      - Added support for flux and sdxl base model detection
      - Print informative messages when skipping LoRAs with missing base models
      84d460f6
    • Stefy Lanza (nextime / spora )'s avatar
      Fix: Add peft to requirements.txt for LoRA adapter support · 2e8b5bc7
      Stefy Lanza (nextime / spora ) authored
      PEFT (Parameter-Efficient Fine-Tuning) is required for loading LoRA
      adapters with pipe.load_lora_weights(). Without it, LoRA loading fails
      with: 'PEFT backend is required for this method.'
      2e8b5bc7
    • Stefy Lanza (nextime / spora )'s avatar
      Feat: Update models.json when pipeline mismatch is detected and corrected · 2b570a0a
      Stefy Lanza (nextime / spora ) authored
      - Add update_model_pipeline_class() function to update model config
      - Call function when main model pipeline mismatch is corrected
      - Call function when image model pipeline mismatch is corrected
      - Ensures future runs use the correct pipeline class automatically
      2b570a0a
    • Stefy Lanza (nextime / spora )'s avatar
  2. 24 Feb, 2026 16 commits