- 18 Jun, 2026 11 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
GPTQModel silently leaves layers it can't map (e.g. gemma-4's fused batched MoE experts) in bf16, producing a near-full-size "checkpoint" that the loader would redirect to and then offload. The worker now scans the saved safetensors and, if <50% of large weight bytes are int-packed, deletes the output and marks the job failed (so it falls back to bitsandbytes) instead of reporting "done". Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- quant jobs now appear on the Tasks page (api_tasks emits kind=quantize) and as a live badge on the HF model-list row (polled; re-renders only on change). - persist job state to <cache>/quantized/jobs.json; on startup a job left "running" is marked "interrupted" only if its owning PID is dead (merge-safe save so multiple processes don't clobber each other). - gitignore the runtime model cache (models/), logs/, and the third-party GPTQModel/ source clone (installed into the venv, not part of this repo). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- Prefix front/uvicorn and re-emitted engine log lines with [HH:MM:SS] so the front log format matches the engine ([HH:MM:SS][nvidia] …); preserve tqdm in-place progress and avoid double-timestamping already-tagged lines. - gpu_detect: _amd_gpu_name() resolves a card's marketing name via amdgpu product_name sysfs, then lspci board/chip name, then vulkaninfo. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Smart context caching (both text backends): - Per-instance generation lock so pooled concurrent requests can't corrupt a shared KV cache (GGUF + HF, incl. streaming worker thread). - GGUF: enable multi-slot LlamaRAMCache, budget via kv_cache_budget_mb (512MB). - HF: replace single exact-text KV slot with an LRU of token-prefix slots + token-level longest-common-prefix + DynamicCache clone/crop (handles mid-history edits); kv_cache_slots (default 3). - Session-affinity routing in ModelInstancePool.acquire(session_key); key from user/X-Session-Id else a stable prefix hash. - RAM-pressure ladder drops reclaimable prefix caches before evicting models. VRAM fix: - Auto-fit check no longer double-counts the KV/activation reserve when expected_vram_gb is already a peak estimate — borderline models (e.g. gemma-4-26B-A4B) stay GPU-resident instead of forced into MoE-thrashing device_map offload. GPTQ/AWQ fast-kernel quant backend (HF path): - New codai/models/quant.py: GPTQModel capability detection, quantized-checkpoint cache, on-demand background quantize job (falls back to bnb if unsupported). - quant_backend config (auto|bnb|gptq|awq); loader auto-uses a quantized checkpoint with Marlin/ExLlama when present, else bitsandbytes. - Admin endpoints + "Quantize to 4-bit" button with live status on the model page. - requirements-nvidia.txt documents the from-source install + numpy caveat. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Surface out-of-process download workers (tracked in _download_status) as first-class tasks in /admin/api/tasks, alongside generations, training and queued requests. They render with a percentage progress bar plus a filename / rate / ETA readout, and can be cancelled from the Tasks page (routed through a shared _cancel_download_session helper) or removed once finished/failed. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
stop_all() now sweeps /proc for any codai.admin.download_worker processes and SIGKILLs them after the engines are stopped — including legacy ppid=1 orphans left by an earlier instance that this front never spawned. Orphaned workers keep holding huggingface_hub's per-blob file lock, which makes the next re-download deadlock at 0%, so Ctrl-C now guarantees they're cleaned up. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Re-downloading a model that was already in progress spawned a second download_worker. Both contend for huggingface_hub's per-blob file lock — the first downloads, the second blocks on the lock and reports 0% forever ("Downloading full repository…"). Two causes, both fixed: - Same-process re-download click: api_download_model now dedups via _active_download_session(model_id, file_pattern) and attaches the client to the live session instead of spawning a rival worker. - Restart case: workers were plain Popen children with no parent-death signal, so a server/engine restart orphaned them (still holding the lock) while the new instance lost its in-memory dedup state. Workers now spawn with PR_SET_PDEATHSIG=SIGKILL so they die with the server; the re-download then resumes cleanly from the .incomplete blob. Also render engine "Loading weights" tqdm progress as a single updating line on a TTY (in-place \r) and throttle to whole-percent changes when piped, instead of one line per update. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
During a GIL-heavy from_pretrained the engine's event loop is blocked, so its /internal/engine-state poll times out and the engine looked "down" with an empty task list — the real loading task never reached the front. Parse load progress from the engine's log stream (which the front already pumps) into Engine.loading and surface it as a synthetic 'loading' task (with live step/total) in _merge_engine_tasks, even when the primary engine is the blocked one. Cleared on "Model loaded successfully" or the next successful poll. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Add a `rate` field to the Task registry and publish step (tokens so far) + tokens/s from the text streaming loop every few tokens; the Tasks page shows "N tok · X.X tok/s" while a generation is running. Flows through the engine→ front task aggregation unchanged (asdict serialization). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- frontproxy: torch-free front proxy + per-vendor engine supervisor with auth, localhost binding, model routing; Ctrl-C now force-kills engines (own session + PDEATHSIG, SIGKILL of engine process groups, watchdog on hung drain) - gemma-4 tool calling: prompt via native tools= template, parse call:NAME{...} into tool_calls, honour generation_config EOS so it stops instead of looping - ds4 external worker, parler/expressive TTS backends, video editor tooling - --debug-requests: full client<->API request/response logging + live snapshots - stop tracking runtime artifacts (video_editor/sessions/, tools/coderai_media/) Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 17 Jun, 2026 2 commits
-
-
Stefy Lanza (nextime / spora ) authored
Fold tools/township_upload.py back into gen_township_fighters.py to match the project's single-file convention. Odds generation, anti-arbitrage checks, ZIP packing and the chunked upload now live alongside the other township helpers; _best_variant reuses the existing _video_variants. Behaviour is unchanged. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Add ZIP packing, anti-arbitrage odds generation, and chunked upload of a rendered match to the Township Combat League server (mbetterd 3-step API). - New tools/township_upload.py: generate_odds (constraint-aware, retries up to 10x, verified with the server's exact sure-bet checks), check_arbitrage, build_match_zip (OVER/UNDER/WIN1-2/KO1-2/RET1-2/DRAW, best enhanced variant), upload_match (create -> chunked zip -> finalize, proxy-safe, progress_cb), and a content signature for upload-state invalidation. - Run page: server endpoint/token/fixture-id, "upload after render" checkbox, and configurable odds ranges; persisted via /save-config + load_config. - Match page: generate/regenerate odds & ZIP, upload with a progress bar (polls /job/<id>), and an "Uploaded" badge that clears when the match is re-rendered, enhanced, edited or deleted. - Auto-upload after a full render when configured; skips (keeps local) any match whose odds fail the arbitrage check after 10 tries. KO/RET odds are coupled to wins by the product cap, so high maxima are not reachable in a no-arbitrage book; the generator samples wins first then bounds KO/RET accordingly. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 16 Jun, 2026 1 commit
-
-
Stefy Lanza (nextime / spora ) authored
Entrances no longer roar/snarl/glare; they walk out composed and in-character. Fights now default to realistic human MMA exchanges with acrobatic/scenery moves demoted to an occasional minority accent. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 15 Jun, 2026 8 commits
-
-
Stefy Lanza (nextime / spora ) authored
Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the existing VRAM budgeting: - hf_loading clamps the accelerate CPU-offload budget to the headroom under the cap, so overflow spills to the disk offload folder instead of growing RSS. - manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps _last_used), shared _evict_one, and _evict_models_for_ram; idle models are evicted before a new load when RSS nears the cap. - ram_monitor.py: background watcher samples RSS, flags a suspected leak when it climbs while the scheduler is idle, and runs a mitigation ladder (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle). - admin /status returns a ram block; Settings page exposes max RAM + evict/ leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge. Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded count so an active upscale no longer reports '0 models loaded'. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Auto-collapse a match's draw outcome(s) to a single canonical draw owned by f1 with f2 as opponent (representing both fighters), preferring an existing f1-owned draw so its rendered files survive. Fixes legacy per-fighter draws and the lone-f2 self-opponent case; lets the draw be regenerated on its own. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Old matches stored a DRAW per fighter, but a draw concerns both fighters so there must be exactly one per match. _run_match_job now dedupes the match's draws (keeping the first) and persists prompts.json on ANY operation, so a legacy match self-heals the moment it's touched — regen no longer rewrites a draw per fighter, and the per-outcome prompt regen targets the single draw. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
township: camera-motion clips, cancel-whole-queue, queue UI, per-outcome prompt regen, 3-person draw - Camera motion: add CAMERA_MOVES and flag every other fight clip as a camera-motion shot whose prompt LEADS with a bold moving-camera directive (front position = strongest weight) so the I2V model moves the camera through the environment instead of locking off. Legacy clips get a camera decision assigned + persisted on prompt regen. The directive is stripped from the keyframe prompt so the still anchor stays sharp. - Cancel whole queue: new /job/cancel-all endpoint flags the running job AND every queued job (worker skips them); the progress cancel button now reads "Cancel all (N)" and empties the queue instead of just the active job. - Queue visibility: detail monitor renders a "Queued (N)" list of the waiting jobs (by friendly scope label), not just a count; matches-list page uses ONE monitor per card (no more blinking between running job and its queued ones). - Per-outcome prompt regen: "prompt↻" on every outcome tile + new outcome-prompt scope rewrites a single outcome's finish+victory shots only. - Draw outcome: strengthen prompts so the victory shot shows all THREE in frame (both fighters + referee) with the referee thrusting BOTH fighters' fists high. - Entrance clips: more explosive/threatening, galvanized, shadow-boxing the air. - "win" outcome is now a POINTS decision by the referee, not a KO. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Legacy matches (created before the intro-clips feature) have no role on their clips, so per-clip prompt regen wrote fight prompts for clips 0-2 instead of the entrance/entrance/face-off intro. Add _clip_role_fighters() which honours an explicit role/fighters or infers from position (clip 0 = f1 entrance, clip 1 = f2 entrance, clip 2 = referee face-off, rest = fight). _fill_clip_prompt() now uses it and PERSISTS the resolved role/fighters onto the clip so the subsequent keyframe regen and render apply the correct profiles + LoRAs. Also make a regenerated prompt authoritative for keyframe generation: clear any stale kf_prompt override when (re)writing a clip prompt (keyframes compose from the clip prompt unless an override exists, which would silently win). Same for outcomes — _plan_outcome_shots now drops o['kf_prompt'] so regenerated outcome prompts feed the outcome keyframes. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
The outcome video render applied one fighter list (both match fighters) to the whole clip, so the referee LoRA and winner-only identity only existed in the keyframe, not at video-generation time. Thread per-segment fighters through _render: segments are now (prompt, frames, seg_stem, seg_fighters) and each segment (and each chained part) applies exactly its own character_profiles + LoRAs, overriding the clip-level list. Outcome segments now load: FINISH = both fighters; VICTORY = winner + referee (decisive) or both fighters + referee (draw). Referee resolution matches the keyframe path. Backward compatible — shorter segment tuples and legacy clips without role/fighters fall back to the clip-level fighters. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- Every match now opens with 3 intro clips before the fight: a bold solo entrance for each fighter, then a referee-officiated face-off stare-down with the start signal. The real fight begins at clip 4. New intro prompt templates + LLM system prompt + PromptGenerator.intro_shot(). - Factor out _build_match_clip_specs() and _fill_clip_prompt() so all four clip-building paths (stage_videos Phase A, new-match, replan, full regen) build intros consistently; intro clips attach only their own participants (solo entrance = one fighter; face-off = both + match referee). - New "clip-prompt" job scope + per-clip "prompt↻" link: rewrite ONLY one clip's prompt in place (role-aware), steering fight clips away from the match's other shots; renders nothing. - "Create whole match" and "Regenerate whole match" (scope "full") now finish with a 2x AI upscale + 2x frame interpolation pass over the final short/long assemblies and outcome videos, reusing the existing enhance machinery. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Township tool (tools/gen_township_fighters.py): - Outcome videos now generate TWO keyframes per outcome (finish + victory), each anchoring its own clip; victory clip uses a dedicated referee shot. - Referee characters: new role on create form, kept out of fighter pools, dressed as officials, attachable per-match and used in victory keyframes. - Per-match referee selection (new-match form + match editor, persisted). - Autogenerate buttons on character/referee, environment and new-match forms (LLM-filled, editable before create) via /profile/autogen + /matches/autogen. - Single-worker generation queue: all coderai-bound jobs (create/regen/train/ match/process) are serialised and surfaced as "queued", with one persistent match-detail monitor replacing the competing per-job pollers (fixes the blinking progress when two jobs were launched at once). coderai: favicon.ico served at /favicon.ico + linked in admin/login templates; bundled township favicon served at /favicon.ico. Also gitignore large packaging/runtime artifact dirs (.packaging-cache/, tmp/). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 14 Jun, 2026 2 commits
-
-
Stefy Lanza (nextime / spora ) authored
rife-ncnn-vulkan and the ffmpeg frame extract/encode were grabbing all cores and ran with no ongoing thermal control. Now: - _cpu_thread_limit() mirrors coderai's half-the-cores cap (honours the OMP_NUM_THREADS set at import). All ffmpeg calls in the upscale + interpolate paths pass -threads N and are CPU-pinned via a sched_setaffinity preexec_fn; rife gets -j capped and the same affinity pin — so neither can saturate 24 cores. - RIFE is one opaque subprocess, so it now runs under a watcher thread that SIGSTOPs it when the GPU/CPU exceeds the configured thermal-high threshold and SIGCONTs it once cooled (the subprocess analogue of the upscaler's per-frame thermal gate), and terminates it on task cancel. Per-frame progress preserved. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Make video enhancement fully AI-on-CoderAI and rework township outcomes. Upscaling (Real-ESRGAN / SD upscalers): - Support diffusers-style .safetensors weights + config.json (e.g. hlky/RealESRGAN_*), not just classic .pth; infer RRDBNet arch/scale from config. fp16 + tiling for performance. - AI-or-fail: no ffmpeg fallback. Auto-select a configured upscaler when the request omits a model (find_capable_model). - Fix a registry-pollution bug: cache upscalers in a private dict, never under a synthetic 'upscale:<id>' key in multi_model_manager.models (which made a later request_model() resolve/reload the bogus key -> 400). - Per-frame progress + a first-class "upscale" task (pause/cancel/thermal), with a periodic thermal re-check through the frame loop. Interpolation (RIFE): - AI-or-fail: removed the ffmpeg minterpolate fallback. Resolve the rife-ncnn-vulkan binary + bundled model robustly, pass exact -n frame count, and pin -g to the SAME GPU CoderAI uses (matched by CUDA device name, not a hardcoded index). Progress + "interpolate" task + thermal guard. Township generator: - One draw per match (not per fighter); longer, configurable outcome videos built as a finish -> victory two-shot sequence; richer, more brutal, camera-aware prompts (finish/victory templates editable on the Prompts page). - Stream large results via response_format=url instead of base64-in-JSON; per-frame progress for both upscale and interpolate. Configurable temp dir: - New --tmp CLI flag and config.tmp_dir (+ admin Settings field, applied live). Sets tempfile.tempdir and TMPDIR/TMP/TEMP so all scratch (frame extraction, upscaling, interpolation) and child processes use it — fixes "[Errno 28] No space left on device" when /tmp is small. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 13 Jun, 2026 7 commits
-
-
Stefy Lanza (nextime / spora ) authored
The 'full' match-regen scope now (1) removes this match's existing keyframes, clip videos, outcome videos and finals up front, so a re-plan that changes the clip count can't leave orphaned files that would get globbed into the reassembled finals; and (2) runs strictly in order — prompts -> keyframes -> clips + outcomes (assemble_finals=False) -> assemble finals as the explicit last phase (4/4) via _reassemble_finals. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Adds a `full` scope to the per-match action handler that rebuilds one match in order: re-plan all fight-clip + outcome prompts (text model) → regenerate every keyframe (image model) → re-render all clips + outcomes and reassemble finals (video model), with live per-phase progress. Other matches are untouched. Wires the confirm dialog, the match-detail button, and the /matches/render scope allowlist. Fix: the `full` confirm label used an apostrophe (match's) inside the single-quoted JS string of the plain triple-quoted _match_js block, which collapsed to a real quote and broke the whole script (reMatch undefined). Reworded to avoid it; verified the rendered JS parses with node --check. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Downloads: run each model download in a clean `python -m codai.admin.download_worker` subprocess streaming JSON progress, so the Stop button reliably cancels by terminating the process (HF parallel/Xet chunk transfers ignore in-thread flags). Adds download-cancel-all. Avoids multiprocessing spawn, which re-imports the server launcher as __main__. VACE extension: detect WanVACEPipeline; new 'extend' mode + cond_frames request field condition on the previous chained part's frame tail (real motion -> forward continuation, fixing the single-frame boomerang). _build_vace_conditioning builds the (video, mask) pair; _snap_wan_frames enforces 4k+1; only the freshly generated frames are returned. VACE also serves keyframe i2v / t2v via masking; i2v/t2v fallbacks skipped for it. Township auto-uses extend for chained parts when the model is VACE. Fight prompts: full-MMA system prompt + rotating per-clip action focus (kicks/knees/elbows/takedowns/ground/submissions) and occasional blood, rebalanced fallback templates, keyframe wardrobe enforcement. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- township: new playback_fps (0 = same as generation fps). coderai uses fps only for the mp4 encode (Wan generates a fixed frame count), so a higher playback fps plays the same frames faster (less slow-motion). The planner counts clip duration as nf/playback_fps so the finals reach their target length at the real play speed. Wired through config/CLI (--playback-fps)/web form/all call sites. - main.py: suppress /v1/{video,images,audio}/progress access-log lines unless --debug-web is set (matching the existing /v1/loras/progress filter). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- ratelimit.py: exempt /v1/video, /v1/audio and /v1/loras progress polls from BOTH auth and rate limiting (shared _PROGRESS_PATHS), matching /v1/images. The township script polls /v1/video/progress ~1/s during a clip; being rate-limited, those polls ate the budget so the generation POST got 429'd (clip failed) and the polls themselves 429'd (stuck step bar). - township _render_once: a 429 now backs off and retries the same render (up to 40 attempts, capped 60s) instead of abandoning the clip; covers clips, chained parts and outcomes. Genuine errors still fail fast. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
coderai: - Thermal: configurable proactive CPU soft-throttle (engage temp + max per-step sleep) that gently slows generation in a warm band so it rarely hits the hard pause; CPU-only, hard pause always takes precedence. Tasks page shows a soft-throttle banner + per-task badge (live, gated on a running task). - Acceleration hot-swap: toggling/changing a model's acceleration now evicts the loaded model (manager.unload_model) so the next request reloads with the new setting — no restart. (acceleration is fused at load time.) - Models UI: cascading distill-LoRA pickers — new /admin/api/accel-loras scans the cache for distill repos; pick the distill model, then its high/low (or single) LoRA from dropdowns. Presets now also fill the high/low fields. - Tasks queue summary now reflects ALL model activity (derived from the unified task list), not just queue-manager requests — fixes the stuck "0 active". - images.py: proactive eviction no longer skipped by a NameError (model_key). township (tools/gen_township_fighters.py): - Per-clip/outcome/keyframe progress now shows real diffusion-step progress (polls /v1/{images,video}/progress) on the CLI spinner and the web step bars, including "shot part N/total" for chained single shots. - Chained-shot concat re-encodes (CFR) instead of stream-copy, fixing the "first half is a static image" freeze at the part boundary. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Township fight-video generator (tools/gen_township_fighters.py): - 16:9 native resolution: default 832x480 video + matching keyframes (configurable video_size); square 512 was off-distribution for Wan2.2. - Split-and-chain rendering: single-render cap (default 50f); clips/outcomes longer than the cap render as chained sub-renders (last frame seeds the next) concatenated into one continuous shot, parts discarded — Matches page unchanged. Planned-clip ceiling raised to 480f. - Separate outcome min/max frames (default 40/70), same split-chain path. - Configurable short/long final-assembly intervals; clip count derives from the long target + fps so the long cut always fills. - Prompt continuity: deterministic wardrobe+environment clause on every clip, replan clip and outcome; stronger LLM system prompts; updated default suffix. - Run page: configurable fighter/environment counts + reference-image counts; moved "Include female fighters" into the Characters card; suggested steps/rank/weight guide table; per-profile LoRA train defaults now mirror the run-page config (lora_* for characters, env_lora_* for environments). - Matches: "Remove match completely" (files + keyframes + prompts.json entry). - Renamed the prompts step to "Generate matches prompts"; removed the gallery page. coderai: - images.py: fix NameError ('model_key' undefined) that silently skipped proactive VRAM eviction before every image load. - thermal.py: cross-worker cooldown — when one generation pauses for heat, all parallel generations now back off until the resume threshold; add process-tree CPU% reader (100%/core). - video.py/manager.py/main.py: offload ref-leak fix, offloaded-load VRAM guard, wire --pipeline-cache flags. - Tasks page CPU tile shows process-tree CPU% scaled to cores. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 12 Jun, 2026 3 commits
-
-
Stefy Lanza (nextime / spora ) authored
VRAM estimation (manager.py): - Weight the effective quant multiplier by REAL per-component parameter shares (new _component_param_shares scans safetensors by component folder) instead of a blind 70/30 split. Wan2.2 is 99.6% quantizable (two 14B experts + text encoder 4-bit, only the 0.13B VAE dense), so the old 0.475x multiplier inflated ~25.8 GB -> 42.7 GB and forced needless offload. Now ~0.28x -> ~25.8 GB. VAE forced dense (conv-only, bnb can't quantize). Auto offload decision (video.py): - 'auto': when peak footprint exceeds free VRAM, go straight to `model` CPU offload (active component on GPU, near full-GPU speed) — no full-GPU gamble, no slow balanced+disk path. - 'auto-borderline' (new mode): same, except a marginal overshoot (<=3 GB) tries full-GPU first to keep both experts resident and use free VRAM, falling back to model offload on OOM. Acceleration LoRA (acceleration.py + video.py): - Keep the distill/Lightning LoRA as an ACTIVE RUNTIME ADAPTER instead of fusing. Fusing into CPU-offloaded bitsandbytes 4-bit weights triggers a dequant->merge->requant per Linear on the CPU — minutes/hours per expert, appearing to hang (high CPU, empty VRAM). Runtime adapters apply at forward time on the GPU at negligible cost and natively cover transformer_2. - _sync_video_loras preserves the accel adapters across per-request LoRA swaps and re-includes them in every set_adapters; _unload_video_loras deletes only per-request adapters, keeping accel. UI (models.html): - Add "Auto borderline-aware" offload strategy option + updated hint. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Merge feature/tasks-quant-thermal: task mgmt, quantization, Wan2.2 video fixes, pipeline cache, smarter offload Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Wan2.2 A14B (dual-expert) generation fixes: - Fuse the Lightning distill LoRA into BOTH experts (transformer + transformer_2); diffusers' fuse_lora defaults to ["transformer"] only, which left the low-noise expert undistilled → 4-step clips collapsed to a solid colour. Also load per-request fighter/env LoRAs into both experts. - Pre-configure the wan22_lightning_4step preset with the local high/low-noise LoRAs (lora_high/lora_low), used when acceleration is enabled, ignored when not; surfaced in the Acceleration UI. - Safety net: only apply the preset's low step count when the distill LoRA actually fused, else fall back to safe steps. - Skip bitsandbytes/quanto quant for the VAE (conv-only → "no linear modules"). VRAM / offload: - Strategy auto-selection actually fires now ('auto' is normalised, not passed through as a no-op) and no longer double-counts the runtime/accel reserve. - Graceful OOM degrade ladder: full-GPU → balanced @ configured% → 80 → 60 → 40 → sequential → disk, respecting the model's balanced_gpu_percent as the starting cap. Expose 'balanced' as a selectable offload strategy. Pipeline disk cache (--pipeline-cache / --rebuild-pipeline-cache): - Cache the quantized base pipeline to disk and reload it on later starts, skipping re-download/re-quantization; accel LoRA re-fused per load. Fail-safe with self-healing invalidate-and-rebuild. Tasks / misc: - Show model loading as a (non-cancellable, non-pausable) Tasks entry. - Filter the Tasks-page pollers from the access log unless --debug-web. - Township gen script: per-image keyframe progress (no longer all-or-nothing). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 11 Jun, 2026 5 commits
-
-
Stefy Lanza (nextime / spora ) authored
Tasks / queue management: - Central in-memory task registry with cooperative cancel, pause/resume, and step progress across image/video/audio/text generation + LoRA training - Tasks admin page (live 2s poll): cancel, interrupt, pause/resume, restart, remove; done jobs auto-drop from the list; bounded persisted job history - Disable interrupted-training recovery via --no-resume-jobs + settings toggle Quantization / acceleration: - TurboQuant embedding vector quantization (data-free, inner-product preserving): built-in NumPy backend + optional turboquant-py library, selectable per embedding model; /v1/embeddings `quantization` param - llama.cpp KV-cache quantization (cache_type_k/v) for GGUF text models, configurable in the Models UI Hardware telemetry: - Thermal cooldown state surfaced on the Tasks page (banner + per-task badge) - Live CPU/GPU/RAM/VRAM usage + temperature panel via /admin/api/system-stats Docs: API documentation gaps/accuracy pass + Swagger overhaul; DISTRIBUTION.md implementation spec. Plus I2V LoRA training channel-mismatch fix. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
The pipeline class is selected from the request mode, which can disagree with the model's real capability (transformer input channels), causing a hard channel-mismatch crash. Detect and degrade gracefully for Wan: - ti2v/i2v request on a t2v model (transformer in_channels=16): rebuild as a plain WanPipeline and run t2v with the keyframe dropped. - t2v request on an i2v model (in_channels=36): rebuild as WanImageToVideoPipeline (image_encoder/processor are optional) and seed a neutral gray frame so the prompt still drives the clip. Both rebuild a sibling pipeline reusing the SAME components, so fused acceleration and per-request LoRAs on the shared transformer carry over with no reload; the view is cached on the pipe so repeated clips reuse it and _sync_video_loras' adapter dedup stays intact. Helpers: _wan_in_channels(), _maybe_t2v_fallback(), _maybe_i2v_fallback(). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Previously a per-request LoRA could only be a local path or HF id, which assumed the client shared the server's filesystem. Add a content-addressed store so remote clients can supply LoRAs by value or handle. Request `loras` spec now accepts (resolved server-side, in priority): id "name:<registered>" -> a LoRA trained on this server (path-independent) id "sha256:<hex>" -> a previously uploaded blob file/data (base64) -> inline weights, cached in the blob store url -> server downloads (cached by content hash) model/path -> legacy local path / HF id (unchanged) - loras.py: blob store (save_lora_blob / lora_blob_exists / _lora_blob_path), resolve_lora_ref(), resolve_request_loras() (in-place -> clean 400 on a missing blob / unknown name). New POST /v1/loras/upload (multipart / JSON base64 / raw, dedup) and GET /v1/loras/blob/{hash} existence check. - LoraConfig / VideoLoraConfig: model now optional; add id/url/file/data/path. - image + video handlers resolve_request_loras() before model work, so signature dedup / VRAM reserve / load_lora_weights read lora.model as before. - gen_township_fighters.py: reference trained LoRAs by id "name:<registered>" (derived from the server path) with the raw path kept as a co-located fallback, so the script works client/server-split. Also harden video load: float(cfg.get('balanced_gpu_percent', 80)) crashed on an explicit null (admin UI writes null for blank fields); use `or 80`. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
- log.py: _redact_blobs() recursively truncates data-URI / base64 fields (init_image, image, mask, character_references, …) to their first 48 chars in the FULL REQUEST DEBUG dump, so a clip request no longer prints tens of KB of base64. Prompts and normal fields are left intact (base64-charset check excludes anything with spaces/punctuation). - requirements.txt: add ftfy (required by the diffusers Wan/T5 prompt_clean path; its absence surfaced as "name 'ftfy' is not defined" at generation). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
Stefy Lanza (nextime / spora ) authored
Per-model `acceleration` config block fuses a distillation LoRA into the pipeline at load and supplies low step-count / guidance defaults at generation time, for a 5-10x speedup. Covers video (Wan), image diffusers (SD/SDXL), and sd.cpp (step/cfg defaults + <lora:> prompt injection). - New codai/models/acceleration.py: preset catalog (ACCEL_PRESETS), resolve_acceleration(), apply_accel_to_pipeline() (load->fuse->unload so it stays orthogonal to per-request character/env LoRAs), accel_call_defaults(). - video.py: fuse accel LoRA after load; _generate_video / _generate_sdcpp_video use preset steps/guidance (request always wins). - images.py: _apply_image_acceleration on both diffusers load paths; _generate_image and _generate_with_sdcpp honour preset steps/guidance. - main.py: surface `acceleration` as a first-class runtime kwarg. - admin: persist `acceleration`; new GET /admin/api/accel-presets; models.html Acceleration/Distillation card (preset dropdown + manual override). Also fix a latent null-trap: float(cfg.get('balanced_gpu_percent', 80)) crashed when the config stored an explicit null (written by the admin UI for blank fields) since .get(key, default) returns the stored None. Use `or 80`. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-
- 10 Jun, 2026 1 commit
-
-
Stefy Lanza (nextime / spora ) authored
Outcome scenes belong to a match, so their keyframe (image model) and video clip (video model) now attach the environment + BOTH match fighters' LoRAs, matching the fight clips — previously they sent only the single named fighter. Resolves the match pair from the in-memory fight_plan, falling back to the saved prompts.json so a single-outcome regen (fight_plan == []) still gets both. Legacy outcomes with no resolvable match keep the single fighter. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
-