LoRA training: async kickoff, restart recovery, keyframe regen UI
Server (codai/api/loras.py):
- /v1/loras/train gains wait (default True) + session; wait=false detaches
the job and returns a job_id, avoiding HTTP read-timeouts on multi-hour
video trainings.
- Disk-persisted job registry keyed by job_id (carries session). Progress
endpoint serves ?job=<id> / ?session=<tok> so a client only ever sees its
own job — no cross-user spillover. Jobs left mid-flight at startup are
marked interrupted.
- Mid-training PEFT checkpoints (SD1.5/SDXL/Wan) + train_state.json; a
resubmit resumes from the last step when base/target/rank (and session)
match, so a reboot no longer throws away hours of Wan training.
Township (tools/gen_township_fighters.py):
- Async training: per-run session token + persisted per-LoRA job_id; polls
by job_id, re-attaches to a running server job after a restart, resubmits
an interrupted one (server resumes from checkpoint).
- Dedicated train timeouts (24h video / 4h image).
- Match page: regenerate/clear keyframes (match-level + per-clip/outcome)
via new keyframes/keyframe render + delete scopes.
tools/videogen.py: mirror the session-token + job-id recovery helpers.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please
register
or
sign in
to comment