1. 19 Jun, 2026 11 commits
    • Stefy Lanza (nextime / spora )'s avatar
      ds4: auto-downloaded weights land in coderai GGUF cache + show on models page · ef106ba1
      Stefy Lanza (nextime / spora ) authored
      When ds4.auto_download is enabled and a deepseek4 request resolves no local
      GGUF, the downloaded weight variant is now relocated into coderai's GGUF cache
      (get_model_cache_dir; move on same FS, symlink across devices) and registered
      in models.json as a text_models entry that mimics the requested ("failed")
      model's config — backend auto, on-request, enabled and visible (removed from
      unloaded/to_download). model_name is threaded ds4 backend → ensure_service →
      ensure_model so the registration mirrors the right entry.
      
      Also: settings "Extra ds4-server args" hint/placeholder updated to reflect the
      auto --kv-disk-dir and SSD-streaming expert-cache sizing
      (--ssd-streaming-cache-experts), noting Q2_K can fail ds4's CUDA prefill.
      
      Diagnosis (no code change): ds4-server's "cuda prefill failed" on the 93GB
      Q2_K variant is a quant-specific ds4 CUDA bug — the 154GB Q4_K completes
      prefill fine (verified: "prompt done 434s" vs Q2_K instant failure), with
      15.8GB VRAM free either way (not OOM, not cache budget, not coderai).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      ef106ba1
    • Stefy Lanza (nextime / spora )'s avatar
      fix(ds4): give ds4 models exclusive VRAM (evict others) to stop expert-cache starvation · 00e21ea5
      Stefy Lanza (nextime / spora ) authored
      ds4-server streams MoE experts and wants the whole GPU for its expert cache, but
      coderai's modest VRAM estimate for a ds4 model let it co-reside another model —
      starving the cache so ds4's layer-0 FFN expert encode failed ("gpu layer 0 ffn
      batch encode failed"). When loading a ds4 model on demand, unload all other models
      first so ds4-server gets the full GPU.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      00e21ea5
    • Stefy Lanza (nextime / spora )'s avatar
      feat(auto-compact): guarantee last message, chunked summarize, signal-if-too-big · 8bfd0855
      Stefy Lanza (nextime / spora ) authored
      - Always keep the CURRENT request (last message) intact and as the very last
        message after compaction (the compacted history/summary precedes it).
      - summarize strategy now CHUNKs the older history and summarizes map-reduce
        (per-chunk then a combined pass) so the summarization prompt can't itself
        overflow.
      - If compaction still can't fit the window (e.g. a single huge final message),
        return HTTP 400 "request too big for context" instead of failing mid-generation.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      8bfd0855
    • Stefy Lanza (nextime / spora )'s avatar
      feat: per-model auto-compact of the conversation context (off by default) · a019905f
      Stefy Lanza (nextime / spora ) authored
      When enabled for a model, if the prompt would exceed auto_compact_pct% of the
      model's context window, the conversation is shrunk to ~65% before generation
      instead of erroring on overflow. Per-model config (auto_compact / auto_compact_pct
      / auto_compact_strategy) with three strategies:
        - drop_oldest    : keep system messages + the most recent turns that fit.
        - keep_head_tail : also keep the first user turn as an anchor + a count note.
        - summarize      : replace the dropped middle with a best-effort LLM summary
                           (generated by the loaded model; falls back to a count note).
      
      Token size is a cheap chars/4 estimate; membership uses object identity so
      value-equal turns don't collide. Wired into the chat path (codai/api/text.py),
      the model-configure whitelist, and the model config modal UI.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      a019905f
    • Stefy Lanza (nextime / spora )'s avatar
      fix(ds4+config): resolve bare model ids, don't over-estimate VRAM, robust config · 8c85e16a
      Stefy Lanza (nextime / spora ) authored
      - ds4: resolve a bare/aliased model id (e.g. "Foo-ds4-Q2_K", no path/extension) to
        its configured .gguf via a config/cache-aware resolver — fixes the 503 ("no local
        deepseek4 GGUF resolved") on chat requests (only "Load now" with a full path
        worked before). Ds4Backend reuses the same resolver.
      - ds4: report a modest VRAM footprint for ds4 models (measured or ~12GB) instead of
        the 100GB+ GGUF size — ds4-server streams experts from SSD and manages its own
        memory, so the old estimate forced needless ~128GB eviction churn every request.
      - ds4: route on-disk KV checkpoints into coderai's offload directory by default
        (--kv-disk-dir <offload>/ds4-kv) unless overridden in extra_args.
      - config: tolerant load (_dc drops unknown keys) so a stale/newer config.json never
        crashes the whole load and silently resets ALL settings to defaults (the "had to
        reconfigure everything" bug). save_config + GET/POST settings carry the new ds4
        fields (model_path, auto_download, ssd_streaming).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      8c85e16a
    • Stefy Lanza (nextime / spora )'s avatar
      feat(ds4): auto-route deepseek4 GGUFs by architecture; serve the requested file · 6a153c58
      Stefy Lanza (nextime / spora ) authored
      - Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read
        from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs
        stay on llama.cpp; the model_id alias still routes for the download case.
      - ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a
        local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the
        managed service per file). No fixed-variant assumption.
      - Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx).
      - New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream
        MoE experts from SSD/disk), model_path (explicit -m override), and
        auto_download (OFF by default — only serve GGUFs already present; error clearly
        instead of silently pulling tens of GB; opt in to fetch model_variant).
      - AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml
        ops) → ds4 for now; and ds4 routing/offload/text-only specifics.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      6a153c58
    • Stefy Lanza (nextime / spora )'s avatar
      fix: tool-call streaming/format robustness + clear over-context error · 3834ecf5
      Stefy Lanza (nextime / spora ) authored
      - Streaming tool gate now withholds the gemma/qwen native `<|tool_call>` marker
        (and partials) too, not just `<tool_call>`/`call:NAME{` — so the raw marker no
        longer leaks to the client mid-stream (Kilo was executing partial calls).
      - Normalize tool-call function.arguments from JSON string → dict before applying
        the chat template, so templates that render `arguments|items` (Qwen) don't
        raise "Can only get item pairs from a mapping".
      - Context-window overflow now returns a meaningful error: a structured SSE error
        event (code context_length_exceeded) when streaming, or HTTP 400 with a clear
        message for non-streaming — instead of injecting "[Generation error: …]" as
        assistant content (which polluted chat history).
      - Models page: unconfigured GGUF files now expose the "Free disk" button (records
        them as "to download" before deleting), matching HF models.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      3834ecf5
    • Stefy Lanza (nextime / spora )'s avatar
      fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history · ade800f9
      Stefy Lanza (nextime / spora ) authored
      - api_model_load: load a GGUF/text model via llama.cpp even when it's also
        bucketed under image/vision (respect the entry's primary model_type), so a
        gemma+mmproj LLM never hits the diffusers from_pretrained() path.
      - model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
        capability and is kept out of the diffusers vision_models/image_models buckets.
      - VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
        quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
        over-estimated into needless CPU offload.
      - Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
        delete idempotent (repo already gone = success).
      - Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
        tok/s. Throughput computed centrally in the task registry (live EMA + run
        average on finish). New "Recent tasks (last 10)" history section.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      ade800f9
    • Stefy Lanza (nextime / spora )'s avatar
      packaging: build a self-contained distribution bundle by default · 7d3d8e5b
      Stefy Lanza (nextime / spora ) authored
      - make_dist_bundle.sh: assemble dist/coderai-docker-dist.tar containing the image
        tarball (docker save | pigz), the coderai-docker runner (run_oci.sh, image tag
        pinned), install.sh and README. Stages under dist/ (not tiny /tmp) and hardlinks
        the multi-GB image tarball instead of copying it.
      - dist-bundle/install.sh: docker-load the image (sudo-fallback for daemon access)
        then install the coderai-docker runner to /usr/local/bin (root) or
        ~/.local/usr/bin (user, added to ~/.bashrc PATH if missing).
      - build_oci_image.sh: after a successful build, export + bundle for distribution
        by default (--no-dist to skip).
      - run_oci.sh: default image tag -> coderai:dist (matches what's shipped/loaded).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      7d3d8e5b
    • Stefy Lanza (nextime / spora )'s avatar
      Merge feat/township-match-upload: to-download list, mmproj vision, styled... · 766fef3c
      Stefy Lanza (nextime / spora ) authored
      Merge feat/township-match-upload: to-download list, mmproj vision, styled modals, broker + packaging
      766fef3c
    • Stefy Lanza (nextime / spora )'s avatar
      feat: model "to-download" list, mmproj vision, styled modals, broker + packaging · cbf7f147
      Stefy Lanza (nextime / spora ) authored
      Web UI / models:
      - "To download" wishlist: models known but not on disk and not configured show
        as non-configured to-download rows. Free-disk on an unconfigured model, Remove
        on a model with no files left, and a new "Add to list" button in the download
        window all record into models.json `to_download`; pruned on enable/download.
        New endpoints model-mark-download / model-unmark-download.
      - mmproj multimodal components: mmproj GGUFs are classified as components (not
        models), selectable per-GGUF in the model config (auto-selected, enables vision
        capability). VulkanBackend loads them via llama.cpp's MTMDChatHandler (--mmproj
        equivalent), and the chat path now forwards image_url content end-to-end.
      - All window.alert() replaced by a shared styled showAlert()/showConfirm() modal
        in base.html (used across every admin template).
      
      Front proxy / broker:
      - Fix engine model-assignment NameError (keep -> _keep).
      - Brokered GET /coderai/capabilities now answers from the front (whole node) so
        multi-GPU hosts report every card, not a single engine's CUDA-visible one.
      - Log a clear reason when the broker is disabled.
      
      Packaging (distributable OCI image):
      - Multi-stage venv image + smoke test; bundle ds4/wav2lip/sadtalker + parler;
        whisper-server etc. dereferenced (cp -aL) so no dangling symlinks.
      - Dockerfile.update + update_oci_image.sh: ~30s incremental code-only rebuild on
        an immutable coderai:base (no 20GB bundle recopy).
      - run_oci.sh: --local/--config-dir + --map to run against existing local config
        and data dirs without a rebuild; --debug[=flags] + --log-file for selectable
        debug flags and a host-tailable file log (launcher tees; supervisord kills the
        process group). tmp_janitor age-prunes the dedicated temp dir.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      cbf7f147
  2. 18 Jun, 2026 11 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Almost all ready · 9d023ec2
      Stefy Lanza (nextime / spora ) authored
      9d023ec2
    • Stefy Lanza (nextime / spora )'s avatar
      quant: reject checkpoints whose weights weren't actually quantized · c741ff5b
      Stefy Lanza (nextime / spora ) authored
      GPTQModel silently leaves layers it can't map (e.g. gemma-4's fused batched MoE
      experts) in bf16, producing a near-full-size "checkpoint" that the loader would
      redirect to and then offload. The worker now scans the saved safetensors and, if
      <50% of large weight bytes are int-packed, deletes the output and marks the job
      failed (so it falls back to bitsandbytes) instead of reporting "done".
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      c741ff5b
    • Stefy Lanza (nextime / spora )'s avatar
      quant: surface jobs on Tasks page + model list, persist across restart · 6d053dc1
      Stefy Lanza (nextime / spora ) authored
      - quant jobs now appear on the Tasks page (api_tasks emits kind=quantize) and as
        a live badge on the HF model-list row (polled; re-renders only on change).
      - persist job state to <cache>/quantized/jobs.json; on startup a job left
        "running" is marked "interrupted" only if its owning PID is dead (merge-safe
        save so multiple processes don't clobber each other).
      - gitignore the runtime model cache (models/), logs/, and the third-party
        GPTQModel/ source clone (installed into the venv, not part of this repo).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      6d053dc1
    • Stefy Lanza (nextime / spora )'s avatar
      front: timestamped logs + AMD GPU marketing-name detection · 48be0d91
      Stefy Lanza (nextime / spora ) authored
      - Prefix front/uvicorn and re-emitted engine log lines with [HH:MM:SS] so the
        front log format matches the engine ([HH:MM:SS][nvidia] …); preserve tqdm
        in-place progress and avoid double-timestamping already-tagged lines.
      - gpu_detect: _amd_gpu_name() resolves a card's marketing name via amdgpu
        product_name sysfs, then lspci board/chip name, then vulkaninfo.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      48be0d91
    • Stefy Lanza (nextime / spora )'s avatar
      feat: smart context caching, VRAM offload fix, GPTQ/AWQ quant backend · 990f9471
      Stefy Lanza (nextime / spora ) authored
      Smart context caching (both text backends):
      - Per-instance generation lock so pooled concurrent requests can't corrupt a
        shared KV cache (GGUF + HF, incl. streaming worker thread).
      - GGUF: enable multi-slot LlamaRAMCache, budget via kv_cache_budget_mb (512MB).
      - HF: replace single exact-text KV slot with an LRU of token-prefix slots +
        token-level longest-common-prefix + DynamicCache clone/crop (handles
        mid-history edits); kv_cache_slots (default 3).
      - Session-affinity routing in ModelInstancePool.acquire(session_key); key from
        user/X-Session-Id else a stable prefix hash.
      - RAM-pressure ladder drops reclaimable prefix caches before evicting models.
      
      VRAM fix:
      - Auto-fit check no longer double-counts the KV/activation reserve when
        expected_vram_gb is already a peak estimate — borderline models (e.g.
        gemma-4-26B-A4B) stay GPU-resident instead of forced into MoE-thrashing
        device_map offload.
      
      GPTQ/AWQ fast-kernel quant backend (HF path):
      - New codai/models/quant.py: GPTQModel capability detection, quantized-checkpoint
        cache, on-demand background quantize job (falls back to bnb if unsupported).
      - quant_backend config (auto|bnb|gptq|awq); loader auto-uses a quantized
        checkpoint with Marlin/ExLlama when present, else bitsandbytes.
      - Admin endpoints + "Quantize to 4-bit" button with live status on the model page.
      - requirements-nvidia.txt documents the from-source install + numpy caveat.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      990f9471
    • Stefy Lanza (nextime / spora )'s avatar
      tasks: show model downloads on the Tasks page · e4c040e2
      Stefy Lanza (nextime / spora ) authored
      Surface out-of-process download workers (tracked in _download_status) as
      first-class tasks in /admin/api/tasks, alongside generations, training and
      queued requests. They render with a percentage progress bar plus a
      filename / rate / ETA readout, and can be cancelled from the Tasks page
      (routed through a shared _cancel_download_session helper) or removed once
      finished/failed.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      e4c040e2
    • Stefy Lanza (nextime / spora )'s avatar
      front: reap orphaned download workers on shutdown · a2460385
      Stefy Lanza (nextime / spora ) authored
      stop_all() now sweeps /proc for any codai.admin.download_worker processes
      and SIGKILLs them after the engines are stopped — including legacy ppid=1
      orphans left by an earlier instance that this front never spawned. Orphaned
      workers keep holding huggingface_hub's per-blob file lock, which makes the
      next re-download deadlock at 0%, so Ctrl-C now guarantees they're cleaned up.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      a2460385
    • Stefy Lanza (nextime / spora )'s avatar
      downloads: dedup re-downloads + kill orphaned workers; single-line load progress · 28a2eecb
      Stefy Lanza (nextime / spora ) authored
      Re-downloading a model that was already in progress spawned a second
      download_worker. Both contend for huggingface_hub's per-blob file lock —
      the first downloads, the second blocks on the lock and reports 0% forever
      ("Downloading full repository…"). Two causes, both fixed:
      
      - Same-process re-download click: api_download_model now dedups via
        _active_download_session(model_id, file_pattern) and attaches the client
        to the live session instead of spawning a rival worker.
      - Restart case: workers were plain Popen children with no parent-death
        signal, so a server/engine restart orphaned them (still holding the lock)
        while the new instance lost its in-memory dedup state. Workers now spawn
        with PR_SET_PDEATHSIG=SIGKILL so they die with the server; the re-download
        then resumes cleanly from the .incomplete blob.
      
      Also render engine "Loading weights" tqdm progress as a single updating
      line on a TTY (in-place \r) and throttle to whole-percent changes when
      piped, instead of one line per update.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      28a2eecb
    • Stefy Lanza (nextime / spora )'s avatar
      front: show model-loading progress on the Tasks page · 3020f828
      Stefy Lanza (nextime / spora ) authored
      During a GIL-heavy from_pretrained the engine's event loop is blocked, so its
      /internal/engine-state poll times out and the engine looked "down" with an empty
      task list — the real loading task never reached the front. Parse load progress
      from the engine's log stream (which the front already pumps) into Engine.loading
      and surface it as a synthetic 'loading' task (with live step/total) in
      _merge_engine_tasks, even when the primary engine is the blocked one. Cleared on
      "Model loaded successfully" or the next successful poll.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      3020f828
    • Stefy Lanza (nextime / spora )'s avatar
      tasks: report live tokens/s for text generation · bc9a8352
      Stefy Lanza (nextime / spora ) authored
      Add a `rate` field to the Task registry and publish step (tokens so far) +
      tokens/s from the text streaming loop every few tokens; the Tasks page shows
      "N tok · X.X tok/s" while a generation is running. Flows through the engine→
      front task aggregation unchanged (asdict serialization).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      bc9a8352
    • Stefy Lanza (nextime / spora )'s avatar
      front/engine split, ds4 + media tooling, gemma-4 native tools; ignore runtime artifacts · b297b25f
      Stefy Lanza (nextime / spora ) authored
      - frontproxy: torch-free front proxy + per-vendor engine supervisor with auth,
        localhost binding, model routing; Ctrl-C now force-kills engines (own session +
        PDEATHSIG, SIGKILL of engine process groups, watchdog on hung drain)
      - gemma-4 tool calling: prompt via native tools= template, parse call:NAME{...}
        into tool_calls, honour generation_config EOS so it stops instead of looping
      - ds4 external worker, parler/expressive TTS backends, video editor tooling
      - --debug-requests: full client<->API request/response logging + live snapshots
      - stop tracking runtime artifacts (video_editor/sessions/, tools/coderai_media/)
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      b297b25f
  3. 17 Jun, 2026 2 commits
    • Stefy Lanza (nextime / spora )'s avatar
      township: inline the upload helpers into the generator · 2fb085f4
      Stefy Lanza (nextime / spora ) authored
      Fold tools/township_upload.py back into gen_township_fighters.py to match the
      project's single-file convention. Odds generation, anti-arbitrage checks, ZIP
      packing and the chunked upload now live alongside the other township helpers;
      _best_variant reuses the existing _video_variants. Behaviour is unchanged.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      2fb085f4
    • Stefy Lanza (nextime / spora )'s avatar
      township: pack + price + upload matches to townshipcombatleague.com · c84c6208
      Stefy Lanza (nextime / spora ) authored
      Add ZIP packing, anti-arbitrage odds generation, and chunked upload of a
      rendered match to the Township Combat League server (mbetterd 3-step API).
      
      - New tools/township_upload.py: generate_odds (constraint-aware, retries up
        to 10x, verified with the server's exact sure-bet checks), check_arbitrage,
        build_match_zip (OVER/UNDER/WIN1-2/KO1-2/RET1-2/DRAW, best enhanced variant),
        upload_match (create -> chunked zip -> finalize, proxy-safe, progress_cb),
        and a content signature for upload-state invalidation.
      - Run page: server endpoint/token/fixture-id, "upload after render" checkbox,
        and configurable odds ranges; persisted via /save-config + load_config.
      - Match page: generate/regenerate odds & ZIP, upload with a progress bar
        (polls /job/<id>), and an "Uploaded" badge that clears when the match is
        re-rendered, enhanced, edited or deleted.
      - Auto-upload after a full render when configured; skips (keeps local) any
        match whose odds fail the arbitrage check after 10 tries.
      
      KO/RET odds are coupled to wins by the product cap, so high maxima are not
      reachable in a no-arbitrage book; the generator samples wins first then bounds
      KO/RET accordingly.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      c84c6208
  4. 16 Jun, 2026 1 commit
  5. 15 Jun, 2026 8 commits
    • Stefy Lanza (nextime / spora )'s avatar
      coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85
      Stefy Lanza (nextime / spora ) authored
      Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
      existing VRAM budgeting:
      
      - hf_loading clamps the accelerate CPU-offload budget to the headroom under
        the cap, so overflow spills to the disk offload folder instead of growing RSS.
      - manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
        _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
        evicted before a new load when RSS nears the cap.
      - ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
        climbs while the scheduler is idle, and runs a mitigation ladder
        (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
      - admin /status returns a ram block; Settings page exposes max RAM + evict/
        leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.
      
      Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
      count so an active upscale no longer reports '0 models loaded'.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      99f8ba85
    • Stefy Lanza (nextime / spora )'s avatar
      township: normalize draw to one canonical both-fighters entry · a9b6d35e
      Stefy Lanza (nextime / spora ) authored
      Auto-collapse a match's draw outcome(s) to a single canonical draw owned by
      f1 with f2 as opponent (representing both fighters), preferring an existing
      f1-owned draw so its rendered files survive. Fixes legacy per-fighter draws
      and the lone-f2 self-opponent case; lets the draw be regenerated on its own.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      a9b6d35e
    • Stefy Lanza (nextime / spora )'s avatar
      township: auto-fix legacy per-fighter draws to one draw per match · 85317252
      Stefy Lanza (nextime / spora ) authored
      Old matches stored a DRAW per fighter, but a draw concerns both fighters so
      there must be exactly one per match. _run_match_job now dedupes the match's
      draws (keeping the first) and persists prompts.json on ANY operation, so a
      legacy match self-heals the moment it's touched — regen no longer rewrites a
      draw per fighter, and the per-outcome prompt regen targets the single draw.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      85317252
    • Stefy Lanza (nextime / spora )'s avatar
      township: camera-motion clips, cancel-whole-queue, queue UI, per-outcome... · 6c22ea86
      Stefy Lanza (nextime / spora ) authored
      township: camera-motion clips, cancel-whole-queue, queue UI, per-outcome prompt regen, 3-person draw
      
      - Camera motion: add CAMERA_MOVES and flag every other fight clip as a
        camera-motion shot whose prompt LEADS with a bold moving-camera directive
        (front position = strongest weight) so the I2V model moves the camera through
        the environment instead of locking off. Legacy clips get a camera decision
        assigned + persisted on prompt regen. The directive is stripped from the
        keyframe prompt so the still anchor stays sharp.
      - Cancel whole queue: new /job/cancel-all endpoint flags the running job AND
        every queued job (worker skips them); the progress cancel button now reads
        "Cancel all (N)" and empties the queue instead of just the active job.
      - Queue visibility: detail monitor renders a "Queued (N)" list of the waiting
        jobs (by friendly scope label), not just a count; matches-list page uses ONE
        monitor per card (no more blinking between running job and its queued ones).
      - Per-outcome prompt regen: "prompt↻" on every outcome tile + new
        outcome-prompt scope rewrites a single outcome's finish+victory shots only.
      - Draw outcome: strengthen prompts so the victory shot shows all THREE in frame
        (both fighters + referee) with the referee thrusting BOTH fighters' fists high.
      - Entrance clips: more explosive/threatening, galvanized, shadow-boxing the air.
      - "win" outcome is now a POINTS decision by the referee, not a KO.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      6c22ea86
    • Stefy Lanza (nextime / spora )'s avatar
      township: infer intro role on prompt regen + make prompt drive keyframe · 300d5669
      Stefy Lanza (nextime / spora ) authored
      Legacy matches (created before the intro-clips feature) have no role on their
      clips, so per-clip prompt regen wrote fight prompts for clips 0-2 instead of
      the entrance/entrance/face-off intro. Add _clip_role_fighters() which honours an
      explicit role/fighters or infers from position (clip 0 = f1 entrance, clip 1 =
      f2 entrance, clip 2 = referee face-off, rest = fight). _fill_clip_prompt() now
      uses it and PERSISTS the resolved role/fighters onto the clip so the subsequent
      keyframe regen and render apply the correct profiles + LoRAs.
      
      Also make a regenerated prompt authoritative for keyframe generation: clear any
      stale kf_prompt override when (re)writing a clip prompt (keyframes compose from
      the clip prompt unless an override exists, which would silently win). Same for
      outcomes — _plan_outcome_shots now drops o['kf_prompt'] so regenerated outcome
      prompts feed the outcome keyframes.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      300d5669
    • Stefy Lanza (nextime / spora )'s avatar
      township: apply correct per-segment LoRAs/profiles at outcome render time · ba4dedac
      Stefy Lanza (nextime / spora ) authored
      The outcome video render applied one fighter list (both match fighters) to the
      whole clip, so the referee LoRA and winner-only identity only existed in the
      keyframe, not at video-generation time. Thread per-segment fighters through
      _render: segments are now (prompt, frames, seg_stem, seg_fighters) and each
      segment (and each chained part) applies exactly its own character_profiles +
      LoRAs, overriding the clip-level list.
      
      Outcome segments now load: FINISH = both fighters; VICTORY = winner + referee
      (decisive) or both fighters + referee (draw). Referee resolution matches the
      keyframe path. Backward compatible — shorter segment tuples and legacy clips
      without role/fighters fall back to the clip-level fighters.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      ba4dedac
    • Stefy Lanza (nextime / spora )'s avatar
      township: pre-fight intro clips, per-clip prompt regen, whole-match 2x enhance · e28f11e9
      Stefy Lanza (nextime / spora ) authored
      - Every match now opens with 3 intro clips before the fight: a bold solo
        entrance for each fighter, then a referee-officiated face-off stare-down
        with the start signal. The real fight begins at clip 4. New intro prompt
        templates + LLM system prompt + PromptGenerator.intro_shot().
      - Factor out _build_match_clip_specs() and _fill_clip_prompt() so all four
        clip-building paths (stage_videos Phase A, new-match, replan, full regen)
        build intros consistently; intro clips attach only their own participants
        (solo entrance = one fighter; face-off = both + match referee).
      - New "clip-prompt" job scope + per-clip "prompt↻" link: rewrite ONLY one
        clip's prompt in place (role-aware), steering fight clips away from the
        match's other shots; renders nothing.
      - "Create whole match" and "Regenerate whole match" (scope "full") now finish
        with a 2x AI upscale + 2x frame interpolation pass over the final short/long
        assemblies and outcome videos, reusing the existing enhance machinery.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      e28f11e9
    • Stefy Lanza (nextime / spora )'s avatar
      township: 2-keyframe outcomes, referees, autogen, generation queue; favicons · 3bfefed0
      Stefy Lanza (nextime / spora ) authored
      Township tool (tools/gen_township_fighters.py):
      - Outcome videos now generate TWO keyframes per outcome (finish + victory),
        each anchoring its own clip; victory clip uses a dedicated referee shot.
      - Referee characters: new role on create form, kept out of fighter pools,
        dressed as officials, attachable per-match and used in victory keyframes.
      - Per-match referee selection (new-match form + match editor, persisted).
      - Autogenerate buttons on character/referee, environment and new-match forms
        (LLM-filled, editable before create) via /profile/autogen + /matches/autogen.
      - Single-worker generation queue: all coderai-bound jobs (create/regen/train/
        match/process) are serialised and surfaced as "queued", with one persistent
        match-detail monitor replacing the competing per-job pollers (fixes the
        blinking progress when two jobs were launched at once).
      
      coderai: favicon.ico served at /favicon.ico + linked in admin/login templates;
      bundled township favicon served at /favicon.ico.
      
      Also gitignore large packaging/runtime artifact dirs (.packaging-cache/, tmp/).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      3bfefed0
  6. 14 Jun, 2026 2 commits
    • Stefy Lanza (nextime / spora )'s avatar
      video: cap CPU cores + thermal-manage RIFE interpolation · 80f8fe22
      Stefy Lanza (nextime / spora ) authored
      rife-ncnn-vulkan and the ffmpeg frame extract/encode were grabbing all cores
      and ran with no ongoing thermal control. Now:
      
      - _cpu_thread_limit() mirrors coderai's half-the-cores cap (honours the
        OMP_NUM_THREADS set at import). All ffmpeg calls in the upscale + interpolate
        paths pass -threads N and are CPU-pinned via a sched_setaffinity preexec_fn;
        rife gets -j capped and the same affinity pin — so neither can saturate 24
        cores.
      - RIFE is one opaque subprocess, so it now runs under a watcher thread that
        SIGSTOPs it when the GPU/CPU exceeds the configured thermal-high threshold and
        SIGCONTs it once cooled (the subprocess analogue of the upscaler's per-frame
        thermal gate), and terminates it on task cancel. Per-frame progress preserved.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      80f8fe22
    • Stefy Lanza (nextime / spora )'s avatar
      video: AI upscale/interpolate, township outcomes, configurable tmp dir · cc2436ed
      Stefy Lanza (nextime / spora ) authored
      Make video enhancement fully AI-on-CoderAI and rework township outcomes.
      
      Upscaling (Real-ESRGAN / SD upscalers):
      - Support diffusers-style .safetensors weights + config.json (e.g.
        hlky/RealESRGAN_*), not just classic .pth; infer RRDBNet arch/scale from
        config. fp16 + tiling for performance.
      - AI-or-fail: no ffmpeg fallback. Auto-select a configured upscaler when the
        request omits a model (find_capable_model).
      - Fix a registry-pollution bug: cache upscalers in a private dict, never under
        a synthetic 'upscale:<id>' key in multi_model_manager.models (which made a
        later request_model() resolve/reload the bogus key -> 400).
      - Per-frame progress + a first-class "upscale" task (pause/cancel/thermal),
        with a periodic thermal re-check through the frame loop.
      
      Interpolation (RIFE):
      - AI-or-fail: removed the ffmpeg minterpolate fallback. Resolve the
        rife-ncnn-vulkan binary + bundled model robustly, pass exact -n frame count,
        and pin -g to the SAME GPU CoderAI uses (matched by CUDA device name, not a
        hardcoded index). Progress + "interpolate" task + thermal guard.
      
      Township generator:
      - One draw per match (not per fighter); longer, configurable outcome videos
        built as a finish -> victory two-shot sequence; richer, more brutal,
        camera-aware prompts (finish/victory templates editable on the Prompts page).
      - Stream large results via response_format=url instead of base64-in-JSON;
        per-frame progress for both upscale and interpolate.
      
      Configurable temp dir:
      - New --tmp CLI flag and config.tmp_dir (+ admin Settings field, applied live).
        Sets tempfile.tempdir and TMPDIR/TMP/TEMP so all scratch (frame extraction,
        upscaling, interpolation) and child processes use it — fixes
        "[Errno 28] No space left on device" when /tmp is small.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      cc2436ed
  7. 13 Jun, 2026 5 commits
    • Stefy Lanza (nextime / spora )'s avatar
      township: whole-match regen — assemble last, clean slate first · 06e61257
      Stefy Lanza (nextime / spora ) authored
      The 'full' match-regen scope now (1) removes this match's existing
      keyframes, clip videos, outcome videos and finals up front, so a re-plan
      that changes the clip count can't leave orphaned files that would get
      globbed into the reassembled finals; and (2) runs strictly in order —
      prompts -> keyframes -> clips + outcomes (assemble_finals=False) ->
      assemble finals as the explicit last phase (4/4) via _reassemble_finals.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      06e61257
    • Stefy Lanza (nextime / spora )'s avatar
      township: "Regenerate whole match" button (end-to-end match redo) · 94a5f1ac
      Stefy Lanza (nextime / spora ) authored
      Adds a `full` scope to the per-match action handler that rebuilds one
      match in order: re-plan all fight-clip + outcome prompts (text model) →
      regenerate every keyframe (image model) → re-render all clips + outcomes
      and reassemble finals (video model), with live per-phase progress. Other
      matches are untouched. Wires the confirm dialog, the match-detail button,
      and the /matches/render scope allowlist.
      
      Fix: the `full` confirm label used an apostrophe (match's) inside the
      single-quoted JS string of the plain triple-quoted _match_js block, which
      collapsed to a real quote and broke the whole script (reMatch undefined).
      Reworded to avoid it; verified the rendered JS parses with node --check.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      94a5f1ac
    • Stefy Lanza (nextime / spora )'s avatar
      video: VACE frame-tail extend, cancellable downloads, MMA fight variety · 0b355364
      Stefy Lanza (nextime / spora ) authored
      Downloads: run each model download in a clean `python -m
      codai.admin.download_worker` subprocess streaming JSON progress, so the
      Stop button reliably cancels by terminating the process (HF parallel/Xet
      chunk transfers ignore in-thread flags). Adds download-cancel-all. Avoids
      multiprocessing spawn, which re-imports the server launcher as __main__.
      
      VACE extension: detect WanVACEPipeline; new 'extend' mode + cond_frames
      request field condition on the previous chained part's frame tail (real
      motion -> forward continuation, fixing the single-frame boomerang).
      _build_vace_conditioning builds the (video, mask) pair; _snap_wan_frames
      enforces 4k+1; only the freshly generated frames are returned. VACE also
      serves keyframe i2v / t2v via masking; i2v/t2v fallbacks skipped for it.
      Township auto-uses extend for chained parts when the model is VACE.
      
      Fight prompts: full-MMA system prompt + rotating per-clip action focus
      (kicks/knees/elbows/takedowns/ground/submissions) and occasional blood,
      rebalanced fallback templates, keyframe wardrobe enforcement.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      0b355364
    • Stefy Lanza (nextime / spora )'s avatar
      Township configurable playback fps; quiet progress-poll access logs · 07b3be5c
      Stefy Lanza (nextime / spora ) authored
      - township: new playback_fps (0 = same as generation fps). coderai uses fps only
        for the mp4 encode (Wan generates a fixed frame count), so a higher playback
        fps plays the same frames faster (less slow-motion). The planner counts clip
        duration as nf/playback_fps so the finals reach their target length at the real
        play speed. Wired through config/CLI (--playback-fps)/web form/all call sites.
      - main.py: suppress /v1/{video,images,audio}/progress access-log lines unless
        --debug-web is set (matching the existing /v1/loras/progress filter).
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      07b3be5c
    • Stefy Lanza (nextime / spora )'s avatar
      Exempt progress polls from rate limit; retry 429s on clip render · 0bdd9466
      Stefy Lanza (nextime / spora ) authored
      - ratelimit.py: exempt /v1/video, /v1/audio and /v1/loras progress polls from
        BOTH auth and rate limiting (shared _PROGRESS_PATHS), matching /v1/images.
        The township script polls /v1/video/progress ~1/s during a clip; being
        rate-limited, those polls ate the budget so the generation POST got 429'd
        (clip failed) and the polls themselves 429'd (stuck step bar).
      - township _render_once: a 429 now backs off and retries the same render (up to
        40 attempts, capped 60s) instead of abandoning the clip; covers clips,
        chained parts and outcomes. Genuine errors still fail fast.
      Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
      0bdd9466