• Stefy Lanza (nextime / spora )'s avatar
    LoRA training: async kickoff, restart recovery, keyframe regen UI · 80f48d72
    Stefy Lanza (nextime / spora ) authored
    Server (codai/api/loras.py):
    - /v1/loras/train gains wait (default True) + session; wait=false detaches
      the job and returns a job_id, avoiding HTTP read-timeouts on multi-hour
      video trainings.
    - Disk-persisted job registry keyed by job_id (carries session). Progress
      endpoint serves ?job=<id> / ?session=<tok> so a client only ever sees its
      own job — no cross-user spillover. Jobs left mid-flight at startup are
      marked interrupted.
    - Mid-training PEFT checkpoints (SD1.5/SDXL/Wan) + train_state.json; a
      resubmit resumes from the last step when base/target/rank (and session)
      match, so a reboot no longer throws away hours of Wan training.
    
    Township (tools/gen_township_fighters.py):
    - Async training: per-run session token + persisted per-LoRA job_id; polls
      by job_id, re-attaches to a running server job after a restart, resubmits
      an interrupted one (server resumes from checkpoint).
    - Dedicated train timeouts (24h video / 4h image).
    - Match page: regenerate/clear keyframes (match-level + per-clip/outcome)
      via new keyframes/keyframe render + delete scopes.
    
    tools/videogen.py: mirror the session-token + job-id recovery helpers.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    80f48d72
videogen.py 72.3 KB