1. 16 Jun, 2026 1 commit
  2. 15 Jun, 2026 8 commits
    • Stefy Lanza (nextime / spora )'s avatar
      coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85
      Stefy Lanza (nextime / spora ) authored
      Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
      existing VRAM budgeting:
      
      - hf_loading clamps the accelerate CPU-offload budget to the headroom under
        the cap, so overflow spills to the disk offload folder instead of growing RSS.
      - manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
        _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
        evicted before a new load when RSS nears the cap.
      - ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
        climbs while the scheduler is idle, and runs a mitigation ladder
        (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
      - admin /status returns a ram block; Settings page exposes max RAM + evict/
        leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.
      
      Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
      count so an active upscale no longer reports '0 models loaded'.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      99f8ba85
    • Stefy Lanza (nextime / spora )'s avatar
      township: normalize draw to one canonical both-fighters entry · a9b6d35e
      Stefy Lanza (nextime / spora ) authored
      Auto-collapse a match's draw outcome(s) to a single canonical draw owned by
      f1 with f2 as opponent (representing both fighters), preferring an existing
      f1-owned draw so its rendered files survive. Fixes legacy per-fighter draws
      and the lone-f2 self-opponent case; lets the draw be regenerated on its own.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      a9b6d35e
    • Stefy Lanza (nextime / spora )'s avatar
      township: auto-fix legacy per-fighter draws to one draw per match · 85317252
      Stefy Lanza (nextime / spora ) authored
      Old matches stored a DRAW per fighter, but a draw concerns both fighters so
      there must be exactly one per match. _run_match_job now dedupes the match's
      draws (keeping the first) and persists prompts.json on ANY operation, so a
      legacy match self-heals the moment it's touched — regen no longer rewrites a
      draw per fighter, and the per-outcome prompt regen targets the single draw.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      85317252
    • Stefy Lanza (nextime / spora )'s avatar
      township: camera-motion clips, cancel-whole-queue, queue UI, per-outcome... · 6c22ea86
      Stefy Lanza (nextime / spora ) authored
      township: camera-motion clips, cancel-whole-queue, queue UI, per-outcome prompt regen, 3-person draw
      
      - Camera motion: add CAMERA_MOVES and flag every other fight clip as a
        camera-motion shot whose prompt LEADS with a bold moving-camera directive
        (front position = strongest weight) so the I2V model moves the camera through
        the environment instead of locking off. Legacy clips get a camera decision
        assigned + persisted on prompt regen. The directive is stripped from the
        keyframe prompt so the still anchor stays sharp.
      - Cancel whole queue: new /job/cancel-all endpoint flags the running job AND
        every queued job (worker skips them); the progress cancel button now reads
        "Cancel all (N)" and empties the queue instead of just the active job.
      - Queue visibility: detail monitor renders a "Queued (N)" list of the waiting
        jobs (by friendly scope label), not just a count; matches-list page uses ONE
        monitor per card (no more blinking between running job and its queued ones).
      - Per-outcome prompt regen: "prompt↻" on every outcome tile + new
        outcome-prompt scope rewrites a single outcome's finish+victory shots only.
      - Draw outcome: strengthen prompts so the victory shot shows all THREE in frame
        (both fighters + referee) with the referee thrusting BOTH fighters' fists high.
      - Entrance clips: more explosive/threatening, galvanized, shadow-boxing the air.
      - "win" outcome is now a POINTS decision by the referee, not a KO.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      6c22ea86
    • Stefy Lanza (nextime / spora )'s avatar
      township: infer intro role on prompt regen + make prompt drive keyframe · 300d5669
      Stefy Lanza (nextime / spora ) authored
      Legacy matches (created before the intro-clips feature) have no role on their
      clips, so per-clip prompt regen wrote fight prompts for clips 0-2 instead of
      the entrance/entrance/face-off intro. Add _clip_role_fighters() which honours an
      explicit role/fighters or infers from position (clip 0 = f1 entrance, clip 1 =
      f2 entrance, clip 2 = referee face-off, rest = fight). _fill_clip_prompt() now
      uses it and PERSISTS the resolved role/fighters onto the clip so the subsequent
      keyframe regen and render apply the correct profiles + LoRAs.
      
      Also make a regenerated prompt authoritative for keyframe generation: clear any
      stale kf_prompt override when (re)writing a clip prompt (keyframes compose from
      the clip prompt unless an override exists, which would silently win). Same for
      outcomes — _plan_outcome_shots now drops o['kf_prompt'] so regenerated outcome
      prompts feed the outcome keyframes.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      300d5669
    • Stefy Lanza (nextime / spora )'s avatar
      township: apply correct per-segment LoRAs/profiles at outcome render time · ba4dedac
      Stefy Lanza (nextime / spora ) authored
      The outcome video render applied one fighter list (both match fighters) to the
      whole clip, so the referee LoRA and winner-only identity only existed in the
      keyframe, not at video-generation time. Thread per-segment fighters through
      _render: segments are now (prompt, frames, seg_stem, seg_fighters) and each
      segment (and each chained part) applies exactly its own character_profiles +
      LoRAs, overriding the clip-level list.
      
      Outcome segments now load: FINISH = both fighters; VICTORY = winner + referee
      (decisive) or both fighters + referee (draw). Referee resolution matches the
      keyframe path. Backward compatible — shorter segment tuples and legacy clips
      without role/fighters fall back to the clip-level fighters.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      ba4dedac
    • Stefy Lanza (nextime / spora )'s avatar
      township: pre-fight intro clips, per-clip prompt regen, whole-match 2x enhance · e28f11e9
      Stefy Lanza (nextime / spora ) authored
      - Every match now opens with 3 intro clips before the fight: a bold solo
        entrance for each fighter, then a referee-officiated face-off stare-down
        with the start signal. The real fight begins at clip 4. New intro prompt
        templates + LLM system prompt + PromptGenerator.intro_shot().
      - Factor out _build_match_clip_specs() and _fill_clip_prompt() so all four
        clip-building paths (stage_videos Phase A, new-match, replan, full regen)
        build intros consistently; intro clips attach only their own participants
        (solo entrance = one fighter; face-off = both + match referee).
      - New "clip-prompt" job scope + per-clip "prompt↻" link: rewrite ONLY one
        clip's prompt in place (role-aware), steering fight clips away from the
        match's other shots; renders nothing.
      - "Create whole match" and "Regenerate whole match" (scope "full") now finish
        with a 2x AI upscale + 2x frame interpolation pass over the final short/long
        assemblies and outcome videos, reusing the existing enhance machinery.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      e28f11e9
    • Stefy Lanza (nextime / spora )'s avatar
      township: 2-keyframe outcomes, referees, autogen, generation queue; favicons · 3bfefed0
      Stefy Lanza (nextime / spora ) authored
      Township tool (tools/gen_township_fighters.py):
      - Outcome videos now generate TWO keyframes per outcome (finish + victory),
        each anchoring its own clip; victory clip uses a dedicated referee shot.
      - Referee characters: new role on create form, kept out of fighter pools,
        dressed as officials, attachable per-match and used in victory keyframes.
      - Per-match referee selection (new-match form + match editor, persisted).
      - Autogenerate buttons on character/referee, environment and new-match forms
        (LLM-filled, editable before create) via /profile/autogen + /matches/autogen.
      - Single-worker generation queue: all coderai-bound jobs (create/regen/train/
        match/process) are serialised and surfaced as "queued", with one persistent
        match-detail monitor replacing the competing per-job pollers (fixes the
        blinking progress when two jobs were launched at once).
      
      coderai: favicon.ico served at /favicon.ico + linked in admin/login templates;
      bundled township favicon served at /favicon.ico.
      
      Also gitignore large packaging/runtime artifact dirs (.packaging-cache/, tmp/).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      3bfefed0
  3. 14 Jun, 2026 2 commits
    • Stefy Lanza (nextime / spora )'s avatar
      video: cap CPU cores + thermal-manage RIFE interpolation · 80f8fe22
      Stefy Lanza (nextime / spora ) authored
      rife-ncnn-vulkan and the ffmpeg frame extract/encode were grabbing all cores
      and ran with no ongoing thermal control. Now:
      
      - _cpu_thread_limit() mirrors coderai's half-the-cores cap (honours the
        OMP_NUM_THREADS set at import). All ffmpeg calls in the upscale + interpolate
        paths pass -threads N and are CPU-pinned via a sched_setaffinity preexec_fn;
        rife gets -j capped and the same affinity pin — so neither can saturate 24
        cores.
      - RIFE is one opaque subprocess, so it now runs under a watcher thread that
        SIGSTOPs it when the GPU/CPU exceeds the configured thermal-high threshold and
        SIGCONTs it once cooled (the subprocess analogue of the upscaler's per-frame
        thermal gate), and terminates it on task cancel. Per-frame progress preserved.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      80f8fe22
    • Stefy Lanza (nextime / spora )'s avatar
      video: AI upscale/interpolate, township outcomes, configurable tmp dir · cc2436ed
      Stefy Lanza (nextime / spora ) authored
      Make video enhancement fully AI-on-CoderAI and rework township outcomes.
      
      Upscaling (Real-ESRGAN / SD upscalers):
      - Support diffusers-style .safetensors weights + config.json (e.g.
        hlky/RealESRGAN_*), not just classic .pth; infer RRDBNet arch/scale from
        config. fp16 + tiling for performance.
      - AI-or-fail: no ffmpeg fallback. Auto-select a configured upscaler when the
        request omits a model (find_capable_model).
      - Fix a registry-pollution bug: cache upscalers in a private dict, never under
        a synthetic 'upscale:<id>' key in multi_model_manager.models (which made a
        later request_model() resolve/reload the bogus key -> 400).
      - Per-frame progress + a first-class "upscale" task (pause/cancel/thermal),
        with a periodic thermal re-check through the frame loop.
      
      Interpolation (RIFE):
      - AI-or-fail: removed the ffmpeg minterpolate fallback. Resolve the
        rife-ncnn-vulkan binary + bundled model robustly, pass exact -n frame count,
        and pin -g to the SAME GPU CoderAI uses (matched by CUDA device name, not a
        hardcoded index). Progress + "interpolate" task + thermal guard.
      
      Township generator:
      - One draw per match (not per fighter); longer, configurable outcome videos
        built as a finish -> victory two-shot sequence; richer, more brutal,
        camera-aware prompts (finish/victory templates editable on the Prompts page).
      - Stream large results via response_format=url instead of base64-in-JSON;
        per-frame progress for both upscale and interpolate.
      
      Configurable temp dir:
      - New --tmp CLI flag and config.tmp_dir (+ admin Settings field, applied live).
        Sets tempfile.tempdir and TMPDIR/TMP/TEMP so all scratch (frame extraction,
        upscaling, interpolation) and child processes use it — fixes
        "[Errno 28] No space left on device" when /tmp is small.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      cc2436ed
  4. 13 Jun, 2026 7 commits
    • Stefy Lanza (nextime / spora )'s avatar
      township: whole-match regen — assemble last, clean slate first · 06e61257
      Stefy Lanza (nextime / spora ) authored
      The 'full' match-regen scope now (1) removes this match's existing
      keyframes, clip videos, outcome videos and finals up front, so a re-plan
      that changes the clip count can't leave orphaned files that would get
      globbed into the reassembled finals; and (2) runs strictly in order —
      prompts -> keyframes -> clips + outcomes (assemble_finals=False) ->
      assemble finals as the explicit last phase (4/4) via _reassemble_finals.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      06e61257
    • Stefy Lanza (nextime / spora )'s avatar
      township: "Regenerate whole match" button (end-to-end match redo) · 94a5f1ac
      Stefy Lanza (nextime / spora ) authored
      Adds a `full` scope to the per-match action handler that rebuilds one
      match in order: re-plan all fight-clip + outcome prompts (text model) →
      regenerate every keyframe (image model) → re-render all clips + outcomes
      and reassemble finals (video model), with live per-phase progress. Other
      matches are untouched. Wires the confirm dialog, the match-detail button,
      and the /matches/render scope allowlist.
      
      Fix: the `full` confirm label used an apostrophe (match's) inside the
      single-quoted JS string of the plain triple-quoted _match_js block, which
      collapsed to a real quote and broke the whole script (reMatch undefined).
      Reworded to avoid it; verified the rendered JS parses with node --check.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      94a5f1ac
    • Stefy Lanza (nextime / spora )'s avatar
      video: VACE frame-tail extend, cancellable downloads, MMA fight variety · 0b355364
      Stefy Lanza (nextime / spora ) authored
      Downloads: run each model download in a clean `python -m
      codai.admin.download_worker` subprocess streaming JSON progress, so the
      Stop button reliably cancels by terminating the process (HF parallel/Xet
      chunk transfers ignore in-thread flags). Adds download-cancel-all. Avoids
      multiprocessing spawn, which re-imports the server launcher as __main__.
      
      VACE extension: detect WanVACEPipeline; new 'extend' mode + cond_frames
      request field condition on the previous chained part's frame tail (real
      motion -> forward continuation, fixing the single-frame boomerang).
      _build_vace_conditioning builds the (video, mask) pair; _snap_wan_frames
      enforces 4k+1; only the freshly generated frames are returned. VACE also
      serves keyframe i2v / t2v via masking; i2v/t2v fallbacks skipped for it.
      Township auto-uses extend for chained parts when the model is VACE.
      
      Fight prompts: full-MMA system prompt + rotating per-clip action focus
      (kicks/knees/elbows/takedowns/ground/submissions) and occasional blood,
      rebalanced fallback templates, keyframe wardrobe enforcement.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      0b355364
    • Stefy Lanza (nextime / spora )'s avatar
      Township configurable playback fps; quiet progress-poll access logs · 07b3be5c
      Stefy Lanza (nextime / spora ) authored
      - township: new playback_fps (0 = same as generation fps). coderai uses fps only
        for the mp4 encode (Wan generates a fixed frame count), so a higher playback
        fps plays the same frames faster (less slow-motion). The planner counts clip
        duration as nf/playback_fps so the finals reach their target length at the real
        play speed. Wired through config/CLI (--playback-fps)/web form/all call sites.
      - main.py: suppress /v1/{video,images,audio}/progress access-log lines unless
        --debug-web is set (matching the existing /v1/loras/progress filter).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      07b3be5c
    • Stefy Lanza (nextime / spora )'s avatar
      Exempt progress polls from rate limit; retry 429s on clip render · 0bdd9466
      Stefy Lanza (nextime / spora ) authored
      - ratelimit.py: exempt /v1/video, /v1/audio and /v1/loras progress polls from
        BOTH auth and rate limiting (shared _PROGRESS_PATHS), matching /v1/images.
        The township script polls /v1/video/progress ~1/s during a clip; being
        rate-limited, those polls ate the budget so the generation POST got 429'd
        (clip failed) and the polls themselves 429'd (stuck step bar).
      - township _render_once: a 429 now backs off and retries the same render (up to
        40 attempts, capped 60s) instead of abandoning the clip; covers clips,
        chained parts and outcomes. Genuine errors still fail fast.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      0bdd9466
    • Stefy Lanza (nextime / spora )'s avatar
      Thermal soft-throttle, accel hot-swap + UI, township progress & concat fixes · 2ec9c384
      Stefy Lanza (nextime / spora ) authored
      coderai:
      - Thermal: configurable proactive CPU soft-throttle (engage temp + max
        per-step sleep) that gently slows generation in a warm band so it rarely hits
        the hard pause; CPU-only, hard pause always takes precedence. Tasks page shows
        a soft-throttle banner + per-task badge (live, gated on a running task).
      - Acceleration hot-swap: toggling/changing a model's acceleration now evicts the
        loaded model (manager.unload_model) so the next request reloads with the new
        setting — no restart. (acceleration is fused at load time.)
      - Models UI: cascading distill-LoRA pickers — new /admin/api/accel-loras scans
        the cache for distill repos; pick the distill model, then its high/low (or
        single) LoRA from dropdowns. Presets now also fill the high/low fields.
      - Tasks queue summary now reflects ALL model activity (derived from the unified
        task list), not just queue-manager requests — fixes the stuck "0 active".
      - images.py: proactive eviction no longer skipped by a NameError (model_key).
      
      township (tools/gen_township_fighters.py):
      - Per-clip/outcome/keyframe progress now shows real diffusion-step progress
        (polls /v1/{images,video}/progress) on the CLI spinner and the web step bars,
        including "shot part N/total" for chained single shots.
      - Chained-shot concat re-encodes (CFR) instead of stream-copy, fixing the
        "first half is a static image" freeze at the part boundary.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      2ec9c384
    • Stefy Lanza (nextime / spora )'s avatar
      Township gen overhaul + coderai thermal/offload/eviction fixes · c8ace9d8
      Stefy Lanza (nextime / spora ) authored
      Township fight-video generator (tools/gen_township_fighters.py):
      - 16:9 native resolution: default 832x480 video + matching keyframes
        (configurable video_size); square 512 was off-distribution for Wan2.2.
      - Split-and-chain rendering: single-render cap (default 50f); clips/outcomes
        longer than the cap render as chained sub-renders (last frame seeds the next)
        concatenated into one continuous shot, parts discarded — Matches page unchanged.
        Planned-clip ceiling raised to 480f.
      - Separate outcome min/max frames (default 40/70), same split-chain path.
      - Configurable short/long final-assembly intervals; clip count derives from the
        long target + fps so the long cut always fills.
      - Prompt continuity: deterministic wardrobe+environment clause on every clip,
        replan clip and outcome; stronger LLM system prompts; updated default suffix.
      - Run page: configurable fighter/environment counts + reference-image counts;
        moved "Include female fighters" into the Characters card; suggested
        steps/rank/weight guide table; per-profile LoRA train defaults now mirror the
        run-page config (lora_* for characters, env_lora_* for environments).
      - Matches: "Remove match completely" (files + keyframes + prompts.json entry).
      - Renamed the prompts step to "Generate matches prompts"; removed the gallery page.
      
      coderai:
      - images.py: fix NameError ('model_key' undefined) that silently skipped
        proactive VRAM eviction before every image load.
      - thermal.py: cross-worker cooldown — when one generation pauses for heat, all
        parallel generations now back off until the resume threshold; add process-tree
        CPU% reader (100%/core).
      - video.py/manager.py/main.py: offload ref-leak fix, offloaded-load VRAM guard,
        wire --pipeline-cache flags.
      - Tasks page CPU tile shows process-tree CPU% scaled to cores.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      c8ace9d8
  5. 12 Jun, 2026 3 commits
    • Stefy Lanza (nextime / spora )'s avatar
      video: param-weighted VRAM estimate, smarter auto offload, runtime accel LoRA · eeb3bba1
      Stefy Lanza (nextime / spora ) authored
      VRAM estimation (manager.py):
      - Weight the effective quant multiplier by REAL per-component parameter
        shares (new _component_param_shares scans safetensors by component folder)
        instead of a blind 70/30 split. Wan2.2 is 99.6% quantizable (two 14B
        experts + text encoder 4-bit, only the 0.13B VAE dense), so the old 0.475x
        multiplier inflated ~25.8 GB -> 42.7 GB and forced needless offload. Now
        ~0.28x -> ~25.8 GB. VAE forced dense (conv-only, bnb can't quantize).
      
      Auto offload decision (video.py):
      - 'auto': when peak footprint exceeds free VRAM, go straight to `model` CPU
        offload (active component on GPU, near full-GPU speed) — no full-GPU gamble,
        no slow balanced+disk path.
      - 'auto-borderline' (new mode): same, except a marginal overshoot (<=3 GB)
        tries full-GPU first to keep both experts resident and use free VRAM,
        falling back to model offload on OOM.
      
      Acceleration LoRA (acceleration.py + video.py):
      - Keep the distill/Lightning LoRA as an ACTIVE RUNTIME ADAPTER instead of
        fusing. Fusing into CPU-offloaded bitsandbytes 4-bit weights triggers a
        dequant->merge->requant per Linear on the CPU — minutes/hours per expert,
        appearing to hang (high CPU, empty VRAM). Runtime adapters apply at forward
        time on the GPU at negligible cost and natively cover transformer_2.
      - _sync_video_loras preserves the accel adapters across per-request LoRA swaps
        and re-includes them in every set_adapters; _unload_video_loras deletes only
        per-request adapters, keeping accel.
      
      UI (models.html):
      - Add "Auto borderline-aware" offload strategy option + updated hint.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      eeb3bba1
    • Stefy Lanza (nextime / spora )'s avatar
      Merge feature/tasks-quant-thermal: task mgmt, quantization, Wan2.2 video... · f55f6578
      Stefy Lanza (nextime / spora ) authored
      Merge feature/tasks-quant-thermal: task mgmt, quantization, Wan2.2 video fixes, pipeline cache, smarter offload
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      f55f6578
    • Stefy Lanza (nextime / spora )'s avatar
      Wan2.2 video fixes, pipeline cache, smarter offload, model-load tasks · dbc09b75
      Stefy Lanza (nextime / spora ) authored
      Wan2.2 A14B (dual-expert) generation fixes:
      - Fuse the Lightning distill LoRA into BOTH experts (transformer +
        transformer_2); diffusers' fuse_lora defaults to ["transformer"] only, which
        left the low-noise expert undistilled → 4-step clips collapsed to a solid
        colour. Also load per-request fighter/env LoRAs into both experts.
      - Pre-configure the wan22_lightning_4step preset with the local high/low-noise
        LoRAs (lora_high/lora_low), used when acceleration is enabled, ignored when
        not; surfaced in the Acceleration UI.
      - Safety net: only apply the preset's low step count when the distill LoRA
        actually fused, else fall back to safe steps.
      - Skip bitsandbytes/quanto quant for the VAE (conv-only → "no linear modules").
      
      VRAM / offload:
      - Strategy auto-selection actually fires now ('auto' is normalised, not passed
        through as a no-op) and no longer double-counts the runtime/accel reserve.
      - Graceful OOM degrade ladder: full-GPU → balanced @ configured% → 80 → 60 →
        40 → sequential → disk, respecting the model's balanced_gpu_percent as the
        starting cap. Expose 'balanced' as a selectable offload strategy.
      
      Pipeline disk cache (--pipeline-cache / --rebuild-pipeline-cache):
      - Cache the quantized base pipeline to disk and reload it on later starts,
        skipping re-download/re-quantization; accel LoRA re-fused per load. Fail-safe
        with self-healing invalidate-and-rebuild.
      
      Tasks / misc:
      - Show model loading as a (non-cancellable, non-pausable) Tasks entry.
      - Filter the Tasks-page pollers from the access log unless --debug-web.
      - Township gen script: per-image keyframe progress (no longer all-or-nothing).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      dbc09b75
  6. 11 Jun, 2026 5 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Add task management, quantization, and hardware telemetry · 8ad15128
      Stefy Lanza (nextime / spora ) authored
      Tasks / queue management:
      - Central in-memory task registry with cooperative cancel, pause/resume,
        and step progress across image/video/audio/text generation + LoRA training
      - Tasks admin page (live 2s poll): cancel, interrupt, pause/resume, restart,
        remove; done jobs auto-drop from the list; bounded persisted job history
      - Disable interrupted-training recovery via --no-resume-jobs + settings toggle
      
      Quantization / acceleration:
      - TurboQuant embedding vector quantization (data-free, inner-product
        preserving): built-in NumPy backend + optional turboquant-py library,
        selectable per embedding model; /v1/embeddings `quantization` param
      - llama.cpp KV-cache quantization (cache_type_k/v) for GGUF text models,
        configurable in the Models UI
      
      Hardware telemetry:
      - Thermal cooldown state surfaced on the Tasks page (banner + per-task badge)
      - Live CPU/GPU/RAM/VRAM usage + temperature panel via /admin/api/system-stats
      
      Docs: API documentation gaps/accuracy pass + Swagger overhaul; DISTRIBUTION.md
      implementation spec. Plus I2V LoRA training channel-mismatch fix.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      8ad15128
    • Stefy Lanza (nextime / spora )'s avatar
      Graceful Wan t2v/i2v fallback on mode-vs-model mismatch · 9494d1bd
      Stefy Lanza (nextime / spora ) authored
      The pipeline class is selected from the request mode, which can disagree with
      the model's real capability (transformer input channels), causing a hard
      channel-mismatch crash. Detect and degrade gracefully for Wan:
      
      - ti2v/i2v request on a t2v model (transformer in_channels=16): rebuild as a
        plain WanPipeline and run t2v with the keyframe dropped.
      - t2v request on an i2v model (in_channels=36): rebuild as
        WanImageToVideoPipeline (image_encoder/processor are optional) and seed a
        neutral gray frame so the prompt still drives the clip.
      
      Both rebuild a sibling pipeline reusing the SAME components, so fused
      acceleration and per-request LoRAs on the shared transformer carry over with no
      reload; the view is cached on the pipe so repeated clips reuse it and
      _sync_video_loras' adapter dedup stays intact. Helpers: _wan_in_channels(),
      _maybe_t2v_fallback(), _maybe_i2v_fallback().
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      9494d1bd
    • Stefy Lanza (nextime / spora )'s avatar
      LoRA transport: upload / id / url / inline file (no shared filesystem) · 42e45456
      Stefy Lanza (nextime / spora ) authored
      Previously a per-request LoRA could only be a local path or HF id, which
      assumed the client shared the server's filesystem. Add a content-addressed
      store so remote clients can supply LoRAs by value or handle.
      
      Request `loras` spec now accepts (resolved server-side, in priority):
        id "name:<registered>"  -> a LoRA trained on this server (path-independent)
        id "sha256:<hex>"       -> a previously uploaded blob
        file/data (base64)      -> inline weights, cached in the blob store
        url                     -> server downloads (cached by content hash)
        model/path              -> legacy local path / HF id (unchanged)
      
      - loras.py: blob store (save_lora_blob / lora_blob_exists / _lora_blob_path),
        resolve_lora_ref(), resolve_request_loras() (in-place -> clean 400 on a
        missing blob / unknown name). New POST /v1/loras/upload (multipart / JSON
        base64 / raw, dedup) and GET /v1/loras/blob/{hash} existence check.
      - LoraConfig / VideoLoraConfig: model now optional; add id/url/file/data/path.
      - image + video handlers resolve_request_loras() before model work, so
        signature dedup / VRAM reserve / load_lora_weights read lora.model as before.
      - gen_township_fighters.py: reference trained LoRAs by id "name:<registered>"
        (derived from the server path) with the raw path kept as a co-located
        fallback, so the script works client/server-split.
      
      Also harden video load: float(cfg.get('balanced_gpu_percent', 80)) crashed on
      an explicit null (admin UI writes null for blank fields); use `or 80`.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      42e45456
    • Stefy Lanza (nextime / spora )'s avatar
      Truncate base64 blobs in request debug log; add ftfy dependency · 2b28cae2
      Stefy Lanza (nextime / spora ) authored
      - log.py: _redact_blobs() recursively truncates data-URI / base64 fields
        (init_image, image, mask, character_references, …) to their first 48 chars
        in the FULL REQUEST DEBUG dump, so a clip request no longer prints tens of KB
        of base64. Prompts and normal fields are left intact (base64-charset check
        excludes anything with spaces/punctuation).
      - requirements.txt: add ftfy (required by the diffusers Wan/T5 prompt_clean
        path; its absence surfaced as "name 'ftfy' is not defined" at generation).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      2b28cae2
    • Stefy Lanza (nextime / spora )'s avatar
      Add acceleration/distillation support (Lightning/Turbo/LCM/Hyper-SD) · d27ebf43
      Stefy Lanza (nextime / spora ) authored
      Per-model `acceleration` config block fuses a distillation LoRA into the
      pipeline at load and supplies low step-count / guidance defaults at
      generation time, for a 5-10x speedup. Covers video (Wan), image diffusers
      (SD/SDXL), and sd.cpp (step/cfg defaults + <lora:> prompt injection).
      
      - New codai/models/acceleration.py: preset catalog (ACCEL_PRESETS),
        resolve_acceleration(), apply_accel_to_pipeline() (load->fuse->unload so
        it stays orthogonal to per-request character/env LoRAs), accel_call_defaults().
      - video.py: fuse accel LoRA after load; _generate_video / _generate_sdcpp_video
        use preset steps/guidance (request always wins).
      - images.py: _apply_image_acceleration on both diffusers load paths;
        _generate_image and _generate_with_sdcpp honour preset steps/guidance.
      - main.py: surface `acceleration` as a first-class runtime kwarg.
      - admin: persist `acceleration`; new GET /admin/api/accel-presets; models.html
        Acceleration/Distillation card (preset dropdown + manual override).
      
      Also fix a latent null-trap: float(cfg.get('balanced_gpu_percent', 80))
      crashed when the config stored an explicit null (written by the admin UI for
      blank fields) since .get(key, default) returns the stored None. Use `or 80`.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      d27ebf43
  7. 10 Jun, 2026 11 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Township: attach env + both fighters' LoRAs to outcome clips & keyframes · bf50d8a1
      Stefy Lanza (nextime / spora ) authored
      Outcome scenes belong to a match, so their keyframe (image model) and video
      clip (video model) now attach the environment + BOTH match fighters' LoRAs,
      matching the fight clips — previously they sent only the single named
      fighter. Resolves the match pair from the in-memory fight_plan, falling back
      to the saved prompts.json so a single-outcome regen (fight_plan == []) still
      gets both. Legacy outcomes with no resolvable match keep the single fighter.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      bf50d8a1
    • Stefy Lanza (nextime / spora )'s avatar
      Township: "Generate missing" keyframes button · d3878ff3
      Stefy Lanza (nextime / spora ) authored
      New keyframes-missing render scope fills in only the keyframes that don't
      exist yet for a whole match (clips + outcomes) — existing ones are kept and
      nothing is re-rendered. Buttons added on the keyframes page and the match
      detail action row; finishes immediately when none are missing.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      d3878ff3
    • Stefy Lanza (nextime / spora )'s avatar
      Township: dedicated keyframes page for a match + outcome fixes · 71fe0c19
      Stefy Lanza (nextime / spora ) authored
      - New /match/keyframes page (🖼 Keyframes ▸ from the match page): thumbnail
        grid of every clip + planned-outcome keyframe, each with per-tile
        regenerate (image model) and delete, plus match-level Regenerate all /
        Clear all. Live progress bars; reloads on completion with mtime
        cache-bust.
      - Regenerate all (whole match) now also covers this match's OUTCOME
        keyframes, not just clips.
      - Clear all now removes planned outcome keyframes too (not only ones with a
        rendered video), keeping it symmetric with Regenerate all.
      - Fix outcome keyframe stem on the page: use the plan entry's actual
        match_name (None → legacy "<fighter>_<outcome>") so it matches the file
        the generator writes, instead of a "<match>_<fighter>_<outcome>" that is
        never created (outcome keyframes were showing "no keyframe" after a
        successful regen).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      71fe0c19
    • Stefy Lanza (nextime / spora )'s avatar
      Township: allow keyframes/keyframe scopes in /matches/render · 9a6550d8
      Stefy Lanza (nextime / spora ) authored
      The render endpoint's scope allowlist rejected the new keyframe-regen
      scopes with a 400 before reaching the handler. Add them so the
      "Regenerate keyframes" / per-clip kf↻ buttons work.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      9a6550d8
    • Stefy Lanza (nextime / spora )'s avatar
      LoRA training: async kickoff, restart recovery, keyframe regen UI · 80f48d72
      Stefy Lanza (nextime / spora ) authored
      Server (codai/api/loras.py):
      - /v1/loras/train gains wait (default True) + session; wait=false detaches
        the job and returns a job_id, avoiding HTTP read-timeouts on multi-hour
        video trainings.
      - Disk-persisted job registry keyed by job_id (carries session). Progress
        endpoint serves ?job=<id> / ?session=<tok> so a client only ever sees its
        own job — no cross-user spillover. Jobs left mid-flight at startup are
        marked interrupted.
      - Mid-training PEFT checkpoints (SD1.5/SDXL/Wan) + train_state.json; a
        resubmit resumes from the last step when base/target/rank (and session)
        match, so a reboot no longer throws away hours of Wan training.
      
      Township (tools/gen_township_fighters.py):
      - Async training: per-run session token + persisted per-LoRA job_id; polls
        by job_id, re-attaches to a running server job after a restart, resubmits
        an interrupted one (server resumes from checkpoint).
      - Dedicated train timeouts (24h video / 4h image).
      - Match page: regenerate/clear keyframes (match-level + per-clip/outcome)
        via new keyframes/keyframe render + delete scopes.
      
      tools/videogen.py: mirror the session-token + job-id recovery helpers.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      80f48d72
    • Stefy Lanza (nextime / spora )'s avatar
      Backends, API, and tooling updates; gitignore township_output · f21c6185
      Stefy Lanza (nextime / spora ) authored
      - cuda/vulkan backend improvements and config plumbing
      - API updates across characters, text, environments, audio, embeddings, tts
      - admin chat/settings template updates
      - add hf_loading helper, video request fields, platform paths
      - new docs (CODERAI_API_DOCUMENTATION.md) and tools (review_outputs, video_dubber)
      - ignore generated township_output/
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      f21c6185
    • Stefy Lanza (nextime / spora )'s avatar
      7dc60f66
    • Stefy Lanza (nextime / spora )'s avatar
      Township: per-card LoRA training progress for concurrent jobs · f4bf08b2
      Stefy Lanza (nextime / spora ) authored
      The server exposes one global training progress (jobs run one at a time via the
      queue), so every queued card was mirroring the active job's progress. Restore the
      name match: a card shows real progress only when the global progress reports ITS
      LoRA; otherwise it shows "queued — '<other>' training first… (elapsed)". Keeps the
      elapsed-timer handling so a long model load still looks alive rather than frozen.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      f4bf08b2
    • Stefy Lanza (nextime / spora )'s avatar
      Wan LoRA: cache the whole stack (VAE + UMT5 + transformer) across jobs · 5d547a33
      Stefy Lanza (nextime / spora ) authored
      Extend the cross-job cache from just the transformer expert(s) to the full Wan
      stack: VAE, tokenizer and text encoder are kept on CPU between jobs (moved to GPU
      only while encoding), experts stay on GPU. A back-to-back training against the
      same base now reloads nothing from disk — previously the small VAE/text-encoder
      still reloaded each job. The releaser and error path clear all cached components.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      5d547a33
    • Stefy Lanza (nextime / spora )'s avatar
      Township: match fighter/env dropdowns, video LoRA status, train progress · fc78dc98
      Stefy Lanza (nextime / spora ) authored
      - Match detail page: Fighter 1/2 and Environment are now dropdowns of saved
        profiles + the built-in pools, so a match's fighters/location can be switched
        before re-rendering (Save match persists; env_desc updated with env).
      - Video LoRA status on profile cards: resolve the active video model the same
        way training/generation do (configured, else auto-picked, cached), and show
        "trained for: <slug>" even when the active model can't be resolved so a trained
        LoRA never looks untrained.
      - Video LoRA training progress: show a ticking elapsed timer during the long
        model-load/preparing phase (the progress endpoint is starved while a large
        model loads), drop the strict name filter that could freeze the bar at 6%, and
        use step-based % once training begins.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      fc78dc98
    • Stefy Lanza (nextime / spora )'s avatar
      Wan LoRA: cache transformer across jobs + smoother progress · d1fc17e0
      Stefy Lanza (nextime / spora ) authored
      Cache the Wan transformer expert(s) between consecutive trainings against the
      same base (keyed by base_path+quantize) so a back-to-back job skips the very slow
      reload (tens of minutes for A14B). Only this job's adapter + gradient-checkpoint
      hooks are removed at teardown; the base transformer(s) stay resident. Since 4-bit
      weights can't move to CPU, they hold GPU VRAM between jobs — so the external VRAM
      releaser now drops the Wan cache too when a generation needs the GPU, and the
      error path clears both caches.
      
      Also report training progress every step (cheap dict update) instead of every
      10, so the web UI bar advances smoothly once steps begin.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      d1fc17e0
  8. 09 Jun, 2026 3 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Wan LoRA trainer: fix fp32/bf16 dtype mismatch in the train step · 8e8c0a45
      Stefy Lanza (nextime / spora ) authored
      torch.rand defaults to fp32, so the rectified-flow interpolation promoted x_t to
      fp32 while the patch-embedding Conv3d stays bf16 (bitsandbytes 4-bit quantizes
      only Linear layers), raising "Input type (float) and bias type (BFloat16) should
      be the same". Compute the interpolation in fp32 then cast x_t/target back to the
      model compute dtype, and pass timestep as fp32 (Wan casts it internally).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      8e8c0a45
    • Stefy Lanza (nextime / spora )'s avatar
      Logging: filter only the noisy progress poll from access log · ad891a34
      Stefy Lanza (nextime / spora ) authored
      Suppressing the whole uvicorn.access logger hid every request. Instead add a
      logging.Filter that drops only /v1/loras/progress (polled ~1.5s by the web UI);
      all other API request lines keep logging. A logger filter also survives uvicorn's
      configure_logging(), which resets the access logger LEVEL at startup (that reset
      is what defeated the earlier setLevel(WARNING)). --debug-web shows everything.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      ad891a34
    • Stefy Lanza (nextime / spora )'s avatar
      Township: Wan video LoRAs, enhance feature, lightbox, outcome fixes · d03385f9
      Stefy Lanza (nextime / spora ) authored
      Several related changes accumulated in this session:
      
      Wan video LoRAs (additive — image LoRAs kept for keyframes):
      - New per-model maps video_loras.json/env_video_loras.json keyed
        name -> {model_slug: path}; on-disk names tagged with the video model slug.
      - Video requests attach the video LoRA matching the current video model's slug;
        image LoRAs stay on the keyframe path only.
      - Per-profile "Train video LoRA" button + step button + full-run checkbox +
        --video-loras/--only-video-loras; batch + CLI wiring; client target="video".
      
      Final/outcome enhance (upscale 2x/4x + raise FPS):
      - _enhance_video_file + Phase C stage; --upscale-factor/--fps-multiplier and
        Run-page selects; match-page Enhance card with live progress bars.
      
      Match page UX:
      - Video previews enlarge + center on play (video lightbox).
      - Match render shows global + per-clip progress bars, surviving reload.
      
      Outcome fixes:
      - Re-rendering a match's outcomes now resolves legacy per-fighter outcomes
        (no match_name) by fighter membership, and forces them into the match's
        environment so a match stays in one location.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      d03385f9