1. 25 Feb, 2026 30 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Add better error handling for tokenizer/cache errors in base model loading · 4e2361f9
      Stefy Lanza (nextime / spora ) authored
      - Detect tokenizer parsing errors and provide helpful cache clearing instructions
      - Add retry logic for corrupted cache files
      - Improve error messages for component-only model loading
      4e2361f9
    • Stefy Lanza (nextime / spora )'s avatar
      Fix LTX pipeline class name: use LTXPipeline instead of LTXVideoPipeline · f0b663fa
      Stefy Lanza (nextime / spora ) authored
      - The correct class name in diffusers is LTXPipeline, not LTXVideoPipeline
      - Updated PIPELINE_CLASS_MAP and detect_pipeline_class
      - Updated all references throughout the codebase
      - This fixes loading models like Muinez/ltxvideo-2b-nsfw
      f0b663fa
    • Stefy Lanza (nextime / spora )'s avatar
      49cca317
    • Stefy Lanza (nextime / spora )'s avatar
      2be9d4da
    • Stefy Lanza (nextime / spora )'s avatar
      Fix I2V model loading: use correct pipeline class for base models · b75b07e6
      Stefy Lanza (nextime / spora ) authored
      - When loading fine-tuned component models (like LTXVideoTransformer3DModel),
        use the correct pipeline class for the base model instead of the configured
        PipelineClass which may be wrong
      - Add proper pipeline class detection for LTX, Wan, SVD, CogVideo, Mochi
      - This fixes loading models like Muinez/ltxvideo-2b-nsfw which have
        config.json only (no model_index.json)
      b75b07e6
    • Stefy Lanza (nextime / spora )'s avatar
      6482f2ac
    • Stefy Lanza (nextime / spora )'s avatar
      Add character consistency features, fix model loading for non-diffusers models · 1f5226ed
      Stefy Lanza (nextime / spora ) authored
      - Add character profile management (create, list, show, delete)
      - Add IP-Adapter and InstantID support for character consistency
      - Fix model loading for models with config.json only (no model_index.json)
      - Add component-only model detection (fine-tuned weights)
      - Update MCP server with character consistency tools
      - Update SKILL.md and README.md documentation
      - Add memory management for dubbing/translation
      - Add chunked processing for Whisper transcription
      - Add character persistency options to web interface
      1f5226ed
    • Stefy Lanza (nextime / spora )'s avatar
      Fix web interface: show image upload for I2V mode · 627eb38f
      Stefy Lanza (nextime / spora ) authored
      Users can now upload their own image for I2V mode instead of
      only being able to generate one. The image upload box is now
      visible in I2V mode, allowing users to either:
      - Upload an existing image to animate
      - Or let the system generate an image first
      
      This provides more flexibility for the I2V workflow.
      627eb38f
    • Stefy Lanza (nextime / spora )'s avatar
      Add style detection and model matching for auto mode · ae596279
      Stefy Lanza (nextime / spora ) authored
      Style Detection (detect_generation_type):
      - Detects 9 artistic styles: anime, photorealistic, digital_art, cgi, cartoon, fantasy, traditional, scifi, horror
      - Extracts style keywords from prompts for matching
      - Returns style info in generation type dict
      
      Style Matching (select_best_model):
      - Matches LoRA adapters to requested style (+60 bonus for style match)
      - Matches base models to requested style (+50 bonus for style match)
      - Checks model name, ID, and tags for style indicators
      - Examples:
        - 'anime girl' → selects anime-optimized models/LoRAs
        - 'photorealistic portrait' → selects realism models
        - 'cyberpunk city' → selects sci-fi models/LoRAs
      
      This allows --auto mode to intelligently select models based on
      the artistic style requested in the prompt.
      ae596279
    • Stefy Lanza (nextime / spora )'s avatar
      Fix webapp.py for Python 3.13+ compatibility · 2bdf2e8a
      Stefy Lanza (nextime / spora ) authored
      - Remove deprecated eventlet dependency
      - Use threading mode for Flask-SocketIO instead of eventlet
      - eventlet is deprecated and has compatibility issues with Python 3.13
      - Threading mode works reliably on all Python versions
      
      This fixes the RuntimeError: Working outside of application context
      errors when running webapp.py on Python 3.13.
      2bdf2e8a
    • Stefy Lanza (nextime / spora )'s avatar
      Add transformers backend for MusicGen (Python 3.13+ compatible) · 86fcdc95
      Stefy Lanza (nextime / spora ) authored
      The generate_music() function now supports two backends:
      
      1. audiocraft (preferred):
         - Original MusicGen implementation
         - Works on Python 3.12 and lower
         - Falls back to transformers if not available
      
      2. transformers (fallback):
         - Uses HuggingFace transformers library
         - Works on Python 3.13+
         - No spacy/blis dependency issues
      
      The function automatically:
      - Tries audiocraft first (if available)
      - Falls back to transformers if audiocraft fails or is not installed
      - Provides clear error messages if neither backend is available
      
      This allows MusicGen music generation to work on Python 3.13 without
      the problematic audiocraft → spacy → thinc → blis dependency chain.
      86fcdc95
    • Stefy Lanza (nextime / spora )'s avatar
      Document audiocraft incompatibility with Python 3.13 · f28cca78
      Stefy Lanza (nextime / spora ) authored
      audiocraft (MusicGen) is NOT compatible with Python 3.13 due to:
      - audiocraft → spacy → thinc → blis
      - blis fails to compile with GCC errors: unrecognized '-mavx512pf' option
      - This is a known issue with blis and newer GCC/Python versions
      
      Updated requirements.txt to:
      - Remove audiocraft from direct dependencies
      - Add note about Python 3.13 incompatibility
      - Suggest using Python 3.12 or lower for audiocraft
      - Or use a separate Python 3.12 environment for music generation
      f28cca78
    • Stefy Lanza (nextime / spora )'s avatar
      Add system dependencies for Debian/Ubuntu to requirements.txt · a4fda7b0
      Stefy Lanza (nextime / spora ) authored
      Added comprehensive system dependencies section for Debian/Ubuntu:
      
      Required system packages:
      - build-essential, cmake, pkg-config (build tools)
      - ffmpeg (video processing)
      - libavformat-dev, libavcodec-dev, libavdevice-dev, libavutil-dev
      - libavfilter-dev, libswscale-dev, libswresample-dev (FFmpeg dev libs)
      - libsdl2-dev, libssl-dev, libcurl4-openssl-dev
      - python3-dev
      
      These are required for:
      - PyAV (av package) - needed by audiocraft/MusicGen
      - face-recognition (dlib)
      - Building Python extensions
      
      Updated quick install instructions to include system dependencies step.
      a4fda7b0
    • Stefy Lanza (nextime / spora )'s avatar
      Update requirements.txt for Python 3.12+ compatibility · b0d43691
      Stefy Lanza (nextime / spora ) authored
      Updated all package versions to be compatible with Python 3.12 and 3.13:
      
      Core Dependencies:
      - torch>=2.2.0 (was 2.0.0)
      - torchvision>=0.17.0 (was 0.15.0)
      - torchaudio>=2.2.0 (was 2.0.0)
      - diffusers>=0.32.0 (was 0.30.0)
      - transformers>=4.40.0 (was 4.35.0)
      - accelerate>=0.27.0 (was 0.24.0)
      - xformers>=0.0.25 (was 0.0.22)
      - spandrel>=0.2.0 (was 0.1.0)
      - ftfy>=6.2.0 (was 6.1.0)
      - Pillow>=10.2.0 (was 10.0.0)
      - safetensors>=0.4.2 (was 0.4.0)
      - huggingface-hub>=0.23.0 (was 0.19.0)
      - peft>=0.10.0 (was 0.7.0)
      - numpy>=1.26.0 (added for Python 3.12+ compatibility)
      
      Audio Dependencies:
      - scipy>=1.12.0 (was 1.11.0)
      - librosa>=0.10.2 (was 0.10.0)
      - edge-tts>=6.1.10 (was 6.1.0)
      
      Web Interface:
      - flask>=3.0.2 (was 3.0.0)
      - flask-socketio>=5.3.6 (was 5.3.0)
      - eventlet>=0.36.0 (was 0.33.0)
      - python-socketio>=5.11.0 (was 5.10.0)
      - werkzeug>=3.0.1 (was 3.0.0)
      
      Added detailed installation notes for Python 3.12+ including:
      - PyTorch nightly installation for CUDA
      - xformers --pre flag for Python 3.13
      - Git installation for diffusers/transformers
      - Quick install commands
      b0d43691
    • Stefy Lanza (nextime / spora )'s avatar
      Fix loading transformer-only fine-tuned models (like Muinez/ltxvideo-2b-nsfw) · 03b62189
      Stefy Lanza (nextime / spora ) authored
      Some models on HuggingFace are not full pipelines but just fine-tuned components
      (e.g., just the transformer weights). These have a config.json at root level with
      _class_name pointing to a component class like 'LTXVideoTransformer3DModel'.
      
      This fix adds:
      
      1. Detection of component-only models:
         - Check for config.json at root level
         - Read _class_name to determine component type
         - Detect if it's a transformer, VAE, or other component
      
      2. Proper loading strategy:
         - Load the base pipeline first (e.g., Lightricks/LTX-Video)
         - Then load the fine-tuned component from the model repo
         - Replace the base component with the fine-tuned one
      
      3. Supported component classes:
         - LTXVideoTransformer3DModel → Lightricks/LTX-Video
         - AutoencoderKLLTXVideo → Lightricks/LTX-Video
         - UNet2DConditionModel, UNet3DConditionModel, AutoencoderKL
      
      This allows loading models like Muinez/ltxvideo-2b-nsfw which are
      fine-tuned transformer weights without a full pipeline structure.
      03b62189
    • Stefy Lanza (nextime / spora )'s avatar
      Fix loading models without model_index.json (I2V models) · c5cdb9fd
      Stefy Lanza (nextime / spora ) authored
      When a model has component folders (transformer, vae, etc.) but no model_index.json
      at the root level, the loading would fail. This fix adds:
      
      1. Base model fallback strategy:
         - Detect model type from model ID (ltx, wan, svd, cogvideo, mochi)
         - Load the known base model first
         - Then attempt to load fine-tuned components from the target model
      
      2. Component detection and loading:
         - List files in the repo to find component folders
         - Load transformer, VAE components from the fine-tuned model
         - Fall back to base model if component loading fails
      
      3. Better error messages:
         - Clear indication of what went wrong
         - Suggestions for alternative models
      
      This fixes loading of models like Muinez/ltxvideo-2b-nsfw which have
      all component folders but are missing the model_index.json file.
      c5cdb9fd
    • Stefy Lanza (nextime / spora )'s avatar
      Add system load detection and more conservative time estimates · ebf80ab6
      Stefy Lanza (nextime / spora ) authored
      System Load Detection:
      - Added get_system_load() method to detect CPU, memory, and GPU utilization
      - CPU load >80% adds 50% slowdown, >50% adds 20% slowdown
      - Memory >90% adds 80% slowdown, >75% adds 40% slowdown
      - GPU utilization >80% adds 60% slowdown, >50% adds 30% slowdown
      - Warning displayed when system is under heavy load
      
      More Conservative Base Estimates:
      - WanPipeline: 3.0s → 5.0s/frame
      - MochiPipeline: 5.0s → 8.0s/frame
      - SVD: 1.5s → 2.5s/frame
      - CogVideoX: 4.0s → 6.0s/frame
      - LTXVideo: 4.0s → 6.0s/frame
      - Flux: 8.0s → 12.0s/frame
      - Allegro: 8.0s → 12.0s/frame
      - Hunyuan: 10.0s → 15.0s/frame
      - OpenSora: 6.0s → 10.0s/frame
      
      More Conservative GPU Tier Multipliers:
      - extreme: 1.0 → 1.2x
      - high: 1.5 → 2.0x
      - medium: 2.5 → 3.5x
      - low: 4.0 → 5.0x
      - very_low: 8.0 → 10.0x
      
      More Conservative Model Loading Times:
      - Huge (>50GB): 10min → 15min
      - Large (30-50GB): 5min → 8min
      - Medium (16-30GB): 3min → 5min
      - Small (<16GB): 1.5min → 3min
      - Download estimate: 15s/GB → 30s/GB
      
      Additional Safety Margins:
      - Overhead increased from 30% to 50%
      - I2V processing overhead increased from 20% to 30%
      - Added 20% safety margin for unpredictable factors
      - Load factor applied to model loading time as well
      ebf80ab6
    • Stefy Lanza (nextime / spora )'s avatar
      Fix time estimation to be more realistic · 8c48cea3
      Stefy Lanza (nextime / spora ) authored
      - Increased base time per frame for all models (2-4x more realistic)
      - Added LTXVideoPipeline specific estimate (4.0s/frame)
      - Increased model loading times (90s-10min based on model size)
      - Added realistic image model loading times for I2V mode
      - Added image generation time based on model type (Flux, SDXL, SD3)
      - Added 30% overhead for I/O and memory operations
      - Added 20% extra time for I2V processing
      - Increased resolution scaling factor to 1.3 (quadratic relationship)
      - Increased download time estimate to 15s/GB with 2min cap
      
      The previous estimates were too optimistic and didn't account for:
      - Full diffusion process (multiple denoising steps)
      - Model loading from disk/download
      - Memory management overhead
      - I2V-specific processing time
      - Image model loading for I2V mode
      8c48cea3
    • Stefy Lanza (nextime / spora )'s avatar
      Add web interface for VideoGen · 5291deb2
      Stefy Lanza (nextime / spora ) authored
      Features:
      - Modern web UI with all generation modes (T2V, I2V, T2I, I2I, V2V, Dub, Subtitles, Upscale)
      - Real-time progress updates via WebSocket
      - File upload for input images/videos/audio
      - File download for generated content
      - Background job processing with progress tracking
      - Job management (cancel, retry, delete)
      - Gallery for browsing generated files
      - REST API for programmatic access
      - Responsive design for desktop and mobile
      
      Backend (webapp.py):
      - Flask + Flask-SocketIO for real-time updates
      - Background job processing with threading
      - File upload/download handling
      - Job state persistence
      - REST API endpoints
      
      Frontend:
      - Modern dark theme UI
      - Mode selection with visual cards
      - Form with all options and settings
      - Real-time progress modal with log streaming
      - Toast notifications
      - Keyboard shortcuts (Ctrl+Enter to submit, Escape to close)
      
      Documentation:
      - Updated README.md with web interface section
      - Updated EXAMPLES.md with web interface usage
      - Updated requirements.txt with web dependencies
      5291deb2
    • Stefy Lanza (nextime / spora )'s avatar
      Add 404 fallback to deferred I2V model loading · 344cd12a
      Stefy Lanza (nextime / spora ) authored
      - Apply same 404 fallback strategy to deferred I2V model loading
      - Try DiffusionPipeline as fallback when model_index.json not found
      - Ensures all model loading paths have consistent error handling
      344cd12a
    • Stefy Lanza (nextime / spora )'s avatar
      Fix model loading 404 errors and improve time estimation · c2c62b60
      Stefy Lanza (nextime / spora ) authored
      Model Loading Fixes:
      - Add fallback loading when model_index.json returns 404
      - Try alternative paths (diffusers/, diffusion_model/, pipeline/)
      - Try generic DiffusionPipeline as fallback
      - Check HuggingFace API for actual file structure
      - Load from subdirectories if model_index.json found there
      - Apply same fallback to I2V image model loading
      
      Time Estimation Improvements:
      - Add hardware detection (GPU model, VRAM, RAM, CPU cores)
      - Detect GPU tier (extreme/high/medium/low/very_low)
      - Calculate realistic time estimates based on GPU performance
      - Account for VRAM constraints and offloading penalty
      - Consider distributed/multi-GPU setups
      - More accurate model loading times (minutes, not seconds)
      - Account for resolution impact (quadratic relationship)
      - Add 20% overhead for memory management
      - Print hardware info for transparency
      
      GPU Tier Performance Multipliers:
      - Extreme (RTX 4090, A100, H100): 1.0x
      - High (RTX 4080, RTX 3090, V100): 1.5x
      - Medium (RTX 4070, RTX 3080, T4): 2.5x
      - Low (RTX 3060, RTX 2070): 4.0x
      - Very Low (GTX 1060, etc.): 8.0x
      c2c62b60
    • Stefy Lanza (nextime / spora )'s avatar
      Add video dubbing, translation, and subtitle features · 6505a00a
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Video dubbing with voice preservation (--dub-video)
      - Automatic subtitle generation (--create-subtitles)
      - Subtitle translation (--translate-subtitles)
      - Burn subtitles into video (--burn-subtitles)
      - Audio transcription using Whisper (--transcribe)
      - Text translation using MarianMT models
      
      New Command-Line Arguments:
      - --transcribe: Transcribe audio from video
      - --whisper-model: Select Whisper model size (tiny/base/small/medium/large)
      - --source-lang: Source language code
      - --target-lang: Target language code for translation
      - --create-subtitles: Create SRT subtitles from video
      - --translate-subtitles: Translate subtitles to target language
      - --burn-subtitles: Burn subtitles into video
      - --subtitle-style: Customize subtitle appearance
      - --dub-video: Translate and dub video with voice preservation
      - --voice-clone/--no-voice-clone: Enable/disable voice cloning
      
      MCP Server Updates:
      - Added videogen_transcribe_video tool
      - Added videogen_create_subtitles tool
      - Added videogen_dub_video tool
      - Added videogen_translate_text tool
      
      Documentation Updates:
      - Updated SKILL.md with dubbing/translation section
      - Updated EXAMPLES.md with comprehensive examples
      - Updated requirements.txt with openai-whisper dependency
      
      Supported Languages:
      English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Vietnamese, Thai, Indonesian, Swedish, Ukrainian
      6505a00a
    • Stefy Lanza (nextime / spora )'s avatar
      Add model type filters and update MCP server · 1c01f5b7
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Model type filters: --t2i-only, --v2v-only, --v2i-only, --3d-only, --tts-only, --audio-only
      - Enhanced model list table with new capability columns (V2V, V2I, 3D, TTS)
      - Updated detect_model_type() to detect all model capabilities
      
      MCP Server Updates:
      - Added videogen_video_to_video tool for V2V style transfer
      - Added videogen_apply_video_filter tool for video filters
      - Added videogen_extract_frames tool for frame extraction
      - Added videogen_create_collage tool for thumbnail grids
      - Added videogen_upscale_video tool for AI upscaling
      - Added videogen_convert_3d tool for 2D-to-3D conversion
      - Added videogen_concat_videos tool for video concatenation
      - Updated model list filter to support all new types
      
      SKILL.md Updates:
      - Added V2V, V2I, 3D to generation types table
      - Added model filter examples
      - Added 8 new use cases for V2V, filters, frames, collage, upscale, 3D, concat
      1c01f5b7
    • Stefy Lanza (nextime / spora )'s avatar
      Add V2V, V2I, 2D-to-3D conversion, and cluster documentation · e69c2d81
      Stefy Lanza (nextime / spora ) authored
      Features Added:
      - Video-to-Video (V2V): Style transfer, filters, concatenation
      - Video-to-Image (V2I): Frame extraction, keyframes, collages
      - 2D-to-3D Conversion: SBS, anaglyph, VR 360 formats
      - Video upscaling with AI (ESRGAN, Real-ESRGAN, SwinIR)
      - Video filters (grayscale, sepia, blur, speed, slow-mo, etc.)
      
      Command-line Arguments:
      - --video: Input video file for V2V/V2I operations
      - --video-to-video: Enable V2V style transfer
      - --video-filter: Apply video filters
      - --extract-frame, --extract-keyframes, --extract-frames
      - --convert-3d-sbs, --convert-3d-anaglyph, --convert-vr
      - --upscale-video, --upscale-method
      
      Model Discovery:
      - Added depth estimation models to --update-models
      - Added 2D-to-3D model searches
      - Added V2V style transfer models
      
      Documentation:
      - Updated README.md with new features
      - Added comprehensive V2V/V2I/2D-to-3D examples
      - Added multi-node cluster setup guide
      - Added NFS shared storage configuration
      e69c2d81
    • Stefy Lanza (nextime / spora )'s avatar
      Add V2V (Video-to-Video), V2I (Video-to-Image), and video processing features · 6f862e60
      Stefy Lanza (nextime / spora ) authored
      - Add video frame extraction (extract_video_frames, extract_keyframes)
      - Add video info retrieval (get_video_info)
      - Add frames to video conversion (frames_to_video)
      - Add video upscaling with AI support (upscale_video)
      - Add video-to-video style transfer (video_to_video_style_transfer)
      - Add video-to-image extraction (video_to_image)
      - Add video collage creation (create_video_collage)
      - Add video filters (apply_video_filter - grayscale, sepia, blur, etc.)
      - Add video concatenation (concat_videos)
      - Add image upscaling (upscale_image)
      
      Features:
      - Extract frames at specific FPS or timestamps
      - AI upscaling with ESRGAN/SwinIR support
      - Scene detection for keyframe extraction
      - Multiple video filters and effects
      - Video concatenation with re-encoding or stream copy
      6f862e60
    • Stefy Lanza (nextime / spora )'s avatar
      Add character consistency features: IP-Adapter, InstantID, Character Profiles, LoRA Training · b0d20d0b
      Stefy Lanza (nextime / spora ) authored
      - Add IP-Adapter integration for character consistency using reference images
      - Add InstantID support for superior face identity preservation
      - Add Character Profile System to store reference images and face embeddings
      - Add LoRA Training Workflow for perfect character consistency
      - Add command-line arguments for all character consistency features
      - Update EXAMPLES.md with comprehensive character consistency documentation
      - Update requirements.txt with optional dependencies (insightface, onnxruntime)
      
      New commands:
      - --character: Use saved character profile
      - --create-character: Create new character profile from reference images
      - --list-characters: List all saved profiles
      - --show-character: Show profile details
      - --ipadapter: Enable IP-Adapter for consistency
      - --instantid: Enable InstantID for face identity
      - --train-lora: Train custom LoRA for character
      b0d20d0b
    • Stefy Lanza (nextime / spora )'s avatar
      Validate base model exists before adding LoRA to model list · 84d460f6
      Stefy Lanza (nextime / spora ) authored
      - When --update-models detects a LoRA adapter, validate that the base
        model exists on HuggingFace before adding it to the model list
      - Skip LoRAs whose base models are not found on HuggingFace
      - Added support for flux and sdxl base model detection
      - Print informative messages when skipping LoRAs with missing base models
      84d460f6
    • Stefy Lanza (nextime / spora )'s avatar
      Fix: Add peft to requirements.txt for LoRA adapter support · 2e8b5bc7
      Stefy Lanza (nextime / spora ) authored
      PEFT (Parameter-Efficient Fine-Tuning) is required for loading LoRA
      adapters with pipe.load_lora_weights(). Without it, LoRA loading fails
      with: 'PEFT backend is required for this method.'
      2e8b5bc7
    • Stefy Lanza (nextime / spora )'s avatar
      Feat: Update models.json when pipeline mismatch is detected and corrected · 2b570a0a
      Stefy Lanza (nextime / spora ) authored
      - Add update_model_pipeline_class() function to update model config
      - Call function when main model pipeline mismatch is corrected
      - Call function when image model pipeline mismatch is corrected
      - Ensures future runs use the correct pipeline class automatically
      2b570a0a
    • Stefy Lanza (nextime / spora )'s avatar
  2. 24 Feb, 2026 10 commits