LoRA training: async kickoff, restart recovery, keyframe regen UI (80f48d72) · Commits · nexlab / coderai

Commit 80f48d72 authored Jun 10, 2026 by

Stefy Lanza (nextime / spora )

LoRA training: async kickoff, restart recovery, keyframe regen UI

Server (codai/api/loras.py):
- /v1/loras/train gains wait (default True) + session; wait=false detaches
  the job and returns a job_id, avoiding HTTP read-timeouts on multi-hour
  video trainings.
- Disk-persisted job registry keyed by job_id (carries session). Progress
  endpoint serves ?job=<id> / ?session=<tok> so a client only ever sees its
  own job — no cross-user spillover. Jobs left mid-flight at startup are
  marked interrupted.
- Mid-training PEFT checkpoints (SD1.5/SDXL/Wan) + train_state.json; a
  resubmit resumes from the last step when base/target/rank (and session)
  match, so a reboot no longer throws away hours of Wan training.

Township (tools/gen_township_fighters.py):
- Async training: per-run session token + persisted per-LoRA job_id; polls
  by job_id, re-attaches to a running server job after a restart, resubmits
  an interrupted one (server resumes from checkpoint).
- Dedicated train timeouts (24h video / 4h image).
- Match page: regenerate/clear keyframes (match-level + per-clip/outcome)
  via new keyframes/keyframe render + delete scopes.

tools/videogen.py: mirror the session-token + job-id recovery helpers.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

parent f21c6185

Expand all Show whitespace changes

Inline Side-by-side

View file @ 80f48d72

This diff is collapsed.

View file @ 80f48d72

This diff is collapsed.

View file @ 80f48d72

This diff is collapsed.

Please register or to comment