-
Stefy Lanza (nextime / spora ) authored
Server (codai/api/loras.py): - /v1/loras/train gains wait (default True) + session; wait=false detaches the job and returns a job_id, avoiding HTTP read-timeouts on multi-hour video trainings. - Disk-persisted job registry keyed by job_id (carries session). Progress endpoint serves ?job=<id> / ?session=<tok> so a client only ever sees its own job — no cross-user spillover. Jobs left mid-flight at startup are marked interrupted. - Mid-training PEFT checkpoints (SD1.5/SDXL/Wan) + train_state.json; a resubmit resumes from the last step when base/target/rank (and session) match, so a reboot no longer throws away hours of Wan training. Township (tools/gen_township_fighters.py): - Async training: per-run session token + persisted per-LoRA job_id; polls by job_id, re-attaches to a running server job after a restart, resubmits an interrupted one (server resumes from checkpoint). - Dedicated train timeouts (24h video / 4h image). - Match page: regenerate/clear keyframes (match-level + per-clip/outcome) via new keyframes/keyframe render + delete scopes. tools/videogen.py: mirror the session-token + job-id recovery helpers. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
80f48d72