Merge feat/township-match-upload: to-download list, mmproj vision, styled...

Merge feat/township-match-upload: to-download list, mmproj vision, styled modals, broker + packaging

Merge feat/township-match-upload: to-download list, mmproj vision, styled...
Merge feat/township-match-upload: to-download list, mmproj vision, styled modals, broker + packaging
766fef3c · Stefy Lanza (nextime / spora ) · 56291911 · cbf7f147 · 766fef3c · 766fef3c
Commit 766fef3c authored Jun 19, 2026 by Stefy Lanza (nextime / spora )
73 changed files
--- a/.dockerignore
+++ b/.dockerignore
@@ -21,6 +21,18 @@ township_output
 dist
 dist-package
 *.log
+tmp
+debug.log
+CoderAI.gif
+# Produced artifacts and tool session/output dirs (mounted as volumes at runtime,
+# never baked into the image)
+video_editor/sessions
+video_editor.config.json
+tools/videogen_output
+tools/township_output
+tools/coderai_media
+samples
 # Build outputs
 build

--- a/.gitignore
+++ b/.gitignore
@@ -17,6 +17,15 @@ __pycache__/
 # Debug logs
 debug.log
+/logs/
+# Runtime model cache (downloads, self-quantized checkpoints, job state).
+# Root-anchored so it never shadows the tracked codai/models/ source package.
+/models/
+# Third-party source clone of the GPTQ quantizer — installed into the venv from
+# source; the working tree is not part of this repo (it has its own .git).
+/GPTQModel/
 # Test files
 test_*.py
@@ -33,3 +42,11 @@ township_output/
 # Packaging build cache + runtime temp (large artifacts)
 .packaging-cache/
 tmp/
+# Exported image tarballs + local OCI run-state (large artifacts)
+dist/
+coderai-runtime/
+# Video editor sessions + generated media (runtime artifacts)
+video_editor/sessions/
+tools/coderai_media/
--- a/AI.PROMPT
+++ b/AI.PROMPT
@@ -286,3 +286,67 @@ safe.
 14. Thermal protection is config-driven and model-agnostic (config.json
    `thermal`). Don't special-case it per model/backend; it only reads temps and
    sleeps. Honour the enable flags and high/resume hysteresis.
+================================================================================
+## Distributable Docker image (packaging/linux)
+================================================================================
+All-in-one image: coderai + tools (editor/videogen/township) behind nginx on a
+single port (8776), built from the LOCAL install's venv + binaries.
+Multi-stage `Dockerfile.oci-venv`:
+  - assembler stage stages the local bundle into /opt/coderai (python-build-
+    standalone interpreter + venv site-packages + ldd'd native libs + parler
+    overlay + lip-sync venv/repos + py310 + ds4). The ~20 GB bundle COPY lives
+    ONLY here; the runtime stage COPYs the assembled tree ONCE (no double-store).
+  - runtime stage: apt (nginx/supervisor/vulkan-tools/ffmpeg/...), COPY the
+    assembled /opt/coderai, then COPY app code → /opt/coderai/app, launchers →
+    /usr/local/bin, nginx/supervisor confs. Entry = coderai-entrypoint →
+    supervisord (nginx + main server + tool UIs).
+  - Do NOT set PYTHONHOME globally (breaks the system-python supervisord); set
+    PATH only. Bundle dereferences host symlinks (cp -aL) so binaries like
+    whisper-server are real files in the image, not dangling links.
+Full build (slow, ~15 min — rebuilds the bundle):
+  packaging/linux/build_oci_image.sh                      # tags coderai:dist
+Smoke test (no weights, checks services + every bundled binary):
+  DOCKER="sudo docker" GPU="--gpus all" PORT=18082 \
+    packaging/linux/smoke_test_services.sh coderai:dist
+Run against your LIVE local config + data (no rebuild — pure bind-mounts):
+  packaging/linux/run_oci.sh --nvidia --local \
+    --map /AI/guffcache --map /AI/huggingface --map /AI/offloads
+  - The image launcher reads config from /config/coderai and runs
+    `coderai --config /config/coderai`, rewriting server.host/port in config.json.
+  - `--local` (= --config-dir ~/.coderai) copies ONLY the *.json config files to
+    a temp dir and mounts it at /config/coderai, so your real config is untouched
+    (use --inplace-config to edit it directly).
+  - `--map HOST[:CONT]` bind-mounts a host dir at the SAME path inside the
+    container so the ABSOLUTE paths in models.json/config.json (gguf/hf caches,
+    offloads) resolve unchanged. Without these maps the models won't be found.
+  - `--debug[=SPEC]` runs coderai with --debug* flags (SPEC default 'all';
+    e.g. `--debug=engine,requests,ws` → --debug-engine/--debug-requests/--debug-ws,
+    `--debug` always auto-added) and writes a host-tailable file log. `--log-file
+    PATH` sets the in-container log path (default /cache/logs/coderai.log → host
+    under the cache mount). Driven by env CODERAI_DEBUG + CODERAI_LOG_FILE, read
+    by the coderai-oci launcher, which tees output so `docker logs` still works.
+    supervisord [program:coderai] uses stopasgroup/killasgroup so the front's
+    engine subprocesses + the tee are torn down together. NOTE: the launcher +
+    supervisord.conf are baked in, so changes need a (fast) update_oci_image.sh.
+Incremental update (FAST, ~30 s — code-only changes, NO bundle recopy):
+  DOCKER="sudo docker" packaging/linux/update_oci_image.sh
+  - `Dockerfile.update` is `FROM coderai:base` and re-layers ONLY the app code +
+    launchers + service confs. The heavy bundle layers are inherited unchanged.
+  - Keeps an immutable `coderai:base` (the bundle) and rebuilds `coderai:dist`
+    as base + a thin app layer. Every update starts from the SAME base, so app
+    layers never stack across updates. dist and base SHARE the bundle layers —
+    keeping both costs only the app layer (a few MB), not a second 23 GB.
+  - First run seeds coderai:base from the current coderai:dist (docker tag).
+  - Re-baseline the bundle (new venv/libs/tools): run build_oci_image.sh, then
+    `docker rmi coderai:base` so the next update re-seeds it from the new dist.
+  - Use this whenever ONLY codai/ app code (or launchers/confs) changed — a full
+    build_oci_image.sh is wasteful for that.
+  - CAUTION: COPY adds/overwrites but does NOT delete files removed from the
+    repo; the cleanup RUN prunes only known-stale paths (.git/venv*/dist/...). A
+    source file deleted from codai/ lingers in the overlay until a full rebuild.
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 ![CoderAI](CoderAI.gif)
-An OpenAI-compatible API server to run models on your local GPU with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support.
+A multimodal and multi-backend local model orchestrator with an OpenAI-compatible API server to run models on local GPUs, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support.
 ## Features

--- a/build.sh
+++ b/build.sh
@@ -35,6 +35,7 @@ BACKEND="${1:-all}"
 FLASH=false
 CUSTOM_VENV=""
 PACKAGE=false
+DS4=false
 # Parse arguments
 i=1
@@ -50,6 +51,9 @@ for arg in "$@"; do
        --package)
            PACKAGE=true
            ;;
+        --ds4)
+            DS4=true
+            ;;
    esac
    i=$((i + 1))
 done
@@ -68,6 +72,7 @@ if [[ "$BACKEND" != "nvidia" && "$BACKEND" != "vulkan" && "$BACKEND" != "vulkan-
    echo ""
    echo "Options:"
    echo "  --flash     - Install Flash Attention 2 for faster inference (NVIDIA only)"
+    echo "  --ds4       - Clone + build the ds4 (DeepSeek V4) native engine"
    exit 1
 fi
@@ -755,6 +760,35 @@ package_app() {
    echo -e "${YELLOW}Note: The target machine must still provide compatible system GPU/runtime libraries.${NC}"
 }
+# Optionally clone + build ds4 (DeepSeek V4 native engine). Opt-in via --ds4.
+# coderai can also auto-build this at runtime on first use, but doing it here lets
+# the OCI/Docker packaging bundle the prebuilt ds4-server binary.
+build_ds4() {
+    local DS4_DIR="${CODERAI_DS4_DIR:-$HOME/.coderai/ds4}"
+    echo -e "${YELLOW}Building ds4 (DeepSeek V4 engine) → $DS4_DIR ...${NC}"
+    if [ ! -e "$DS4_DIR/Makefile" ]; then
+        mkdir -p "$(dirname "$DS4_DIR")"
+        git clone --depth 1 https://github.com/antirez/ds4 "$DS4_DIR" || {
+            echo -e "${YELLOW}Warning: could not clone ds4; skipping.${NC}"; return 0; }
+    fi
+    local TARGET="cpu"
+    if command -v nvcc &> /dev/null || [ -d "/usr/local/cuda" ]; then
+        TARGET="cuda-generic"
+    elif [ "$(uname -s)" = "Darwin" ]; then
+        TARGET=""   # bare `make` builds the macOS Metal backend
+    fi
+    ( cd "$DS4_DIR" && make $TARGET ) || {
+        echo -e "${YELLOW}Warning: ds4 build failed; it can still be built at runtime.${NC}"; return 0; }
+    if [ -x "$DS4_DIR/ds4-server" ]; then
+        echo -e "${GREEN}✓ ds4-server built at $DS4_DIR/ds4-server${NC}"
+        echo -e "${YELLOW}Note: DeepSeek V4 weights are downloaded on first use (multi-GB).${NC}"
+    fi
+}
+if [ "$DS4" = true ]; then
+    build_ds4
+fi
 # Create .backend file to track which backend was used
 echo "$BACKEND" > .backend

--- a/codai/admin/routes.py
+++ b/codai/admin/routes.py
--- a/codai/admin/templates/archive.html
+++ b/codai/admin/templates/archive.html
@@ -335,7 +335,7 @@ async function deleteEntry() {
    closeDetail();
    loadArchive();
  } catch(e) {
-    alert('Delete failed: ' + e.message);
+    showAlert('Delete failed: ' + e.message);
  }
 }

--- a/codai/admin/templates/base.html
+++ b/codai/admin/templates/base.html
@@ -104,6 +104,81 @@ function donateCopy(id, btn) {
 </main>
 {% endif %}
+<!-- Shared confirm / notice modal (replaces window.confirm / window.alert) -->
+<div id="confirm-modal" class="modal" onclick="if(event.target===this)document.getElementById('confirm-modal-cancel').click()">
+  <div class="modal-box" style="max-width:420px">
+    <div class="modal-head">
+      <span class="modal-title" id="confirm-modal-title">Confirm</span>
+      <button class="modal-close" id="confirm-modal-x">&times;</button>
+    </div>
+    <div class="modal-body">
+      <p id="confirm-modal-msg" style="margin:0 0 1.25rem;white-space:pre-wrap"></p>
+      <div style="display:flex;gap:.5rem;justify-content:flex-end">
+        <button class="btn btn-ghost" id="confirm-modal-cancel">Cancel</button>
+        <button class="btn btn-danger" id="confirm-modal-ok">Confirm</button>
+      </div>
+    </div>
+  </div>
+</div>
+<script>
+// Global modal helpers, shared by every admin page. Defined here so templates
+// can call showAlert()/showConfirm() instead of window.alert()/window.confirm().
+if(typeof window.openModal!=='function') window.openModal=function(id){document.getElementById(id).classList.add('show')};
+if(typeof window.closeModal!=='function') window.closeModal=function(id){document.getElementById(id).classList.remove('show')};
+window.showConfirm=function(title, msg, okLabel){
+  return new Promise(resolve => {
+    document.getElementById('confirm-modal-title').textContent = title;
+    document.getElementById('confirm-modal-msg').textContent = msg;
+    const okBtn    = document.getElementById('confirm-modal-ok');
+    const cancelBtn= document.getElementById('confirm-modal-cancel');
+    const xBtn     = document.getElementById('confirm-modal-x');
+    okBtn.className = 'btn btn-danger';
+    okBtn.textContent = okLabel || 'Confirm';
+    cancelBtn.style.display = '';
+    openModal('confirm-modal');
+    function cleanup(result){
+      closeModal('confirm-modal');
+      okBtn.removeEventListener('click', onOk);
+      cancelBtn.removeEventListener('click', onCancel);
+      xBtn.removeEventListener('click', onCancel);
+      resolve(result);
+    }
+    function onOk(){ cleanup(true); }
+    function onCancel(){ cleanup(false); }
+    okBtn.addEventListener('click', onOk);
+    cancelBtn.addEventListener('click', onCancel);
+    xBtn.addEventListener('click', onCancel);
+  });
+};
+// Styled replacement for window.alert(): a single-button notice modal.
+window.showAlert=function(msg, title, kind){
+  return new Promise(resolve => {
+    if(!title && !kind && /^\s*(error|failed|cannot|could not)\b/i.test(String(msg||''))) kind = 'error';
+    document.getElementById('confirm-modal-title').textContent =
+      title || (kind === 'error' ? 'Error' : 'Notice');
+    document.getElementById('confirm-modal-msg').textContent = msg;
+    const okBtn     = document.getElementById('confirm-modal-ok');
+    const cancelBtn = document.getElementById('confirm-modal-cancel');
+    const xBtn      = document.getElementById('confirm-modal-x');
+    okBtn.className = 'btn btn-primary';
+    okBtn.textContent = 'OK';
+    cancelBtn.style.display = 'none';
+    openModal('confirm-modal');
+    function cleanup(){
+      closeModal('confirm-modal');
+      cancelBtn.style.display = '';
+      okBtn.removeEventListener('click', onOk);
+      xBtn.removeEventListener('click', onOk);
+      resolve();
+    }
+    function onOk(){ cleanup(); }
+    okBtn.addEventListener('click', onOk);
+    xBtn.addEventListener('click', onOk);
+  });
+};
+</script>
 {% block scripts %}{% endblock %}
 </body>
 </html>
--- a/codai/admin/templates/chat.html
+++ b/codai/admin/templates/chat.html
@@ -2372,7 +2372,7 @@ const STUDIO_CAPABILITIES = {
    optional:[],
    notes:[
      'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.',
-      'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
+      'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="' + (window.ROOT_PATH||'') + '/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
      'No AI model selection needed — this feature uses its own dedicated backend.',
    ],
    backendPath: ROOT_PATH + '/v1/images/faceswap',
@@ -2386,7 +2386,7 @@ const STUDIO_CAPABILITIES = {
    optional:[],
    notes:[
      'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.',
-      'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
+      'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="' + (window.ROOT_PATH||'') + '/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
      'No AI model selection needed — this feature uses its own dedicated backend.',
    ],
    backendPath: ROOT_PATH + '/v1/images/faceswap',
@@ -2461,14 +2461,14 @@ function capSearchUrl(cap) {
  const s = CAP_TO_HF_SEARCH[cap];
  if (!s) return null;
  const p = new URLSearchParams({ tab:'search', q: s.q, pipeline: s.pipeline, gguf: s.gguf });
-  return '/admin/models?' + p.toString();
+  return (window.ROOT_PATH || '') + '/admin/models?' + p.toString();
 }
 function capMissingHtml(caps, label) {
  if (!caps.length) return '';
  const links = caps.map(cap => {
    const chip = `<span class="cap-chip dim">${cap.replace(/_/g,' ')}</span>`;
    if (_localCapSet.has(cap)) {
-      const url = `/admin/models?local_cap=${encodeURIComponent(cap)}`;
+      const url = `${window.ROOT_PATH || ''}/admin/models?local_cap=${encodeURIComponent(cap)}`;
      return `<a href="${url}" class="cap-find-link" title="You have a local model with ${cap.replace(/_/g,' ')} — click to configure it">${chip}<span class="cap-find-icon" style="color:#6ecf7e">↑ configure</span></a>`;
    }
    const url = capSearchUrl(cap);
@@ -4229,12 +4229,12 @@ async function loadCharProfileIntoSlot(prefix, idx, name) {
    charSlots[prefix][idx].name = charSlots[prefix][idx].name || d.name;
    charSlots[prefix][idx].images = (d.images||[]).map(img => img.data);
    renderCharSlots(prefix);
-  } catch(e) { alert('Failed to load profile: '+e.message); }
+  } catch(e) { showAlert('Failed to load profile: '+e.message); }
 }
 async function saveCharSlotAsProfile(prefix, idx) {
  const slot = charSlots[prefix]?.[idx];
-  if (!slot || !slot.images.length) { alert('Add at least one image first.'); return; }
+  if (!slot || !slot.images.length) { showAlert('Add at least one image first.'); return; }
  const name = slot.name || prompt('Profile name:');
  if (!name) return;
  try {
@@ -4246,8 +4246,8 @@ async function saveCharSlotAsProfile(prefix, idx) {
    charSlots[prefix][idx].name = name;
    await loadCharProfileList();
    renderCharSlots(prefix);
-    alert(`Saved profile "${name}"`);
+    showAlert(`Saved profile "${name}"`);
-  } catch(e) { alert('Save failed: '+e.message); }
+  } catch(e) { showAlert('Save failed: '+e.message); }
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6051,14 +6051,14 @@ async function profCharView(name) {
  try {
    const d = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name)).then(r=>r.json());
    _openProfModal(`Character: ${d.name}`, d.description||'', d.images||[]);
-  } catch(e) { alert('Failed to load character: ' + e.message); }
+  } catch(e) { showAlert('Failed to load character: ' + e.message); }
 }
 async function profCharDelete(name) {
  if (!confirm(`Delete character profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profCharLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
@@ -6139,7 +6139,7 @@ async function profVoiceDelete(name) {
  if (!confirm(`Delete voice profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/voices/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profVoiceLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6296,14 +6296,14 @@ async function profEnvView(name) {
  try {
    const d = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name)).then(r=>r.json());
    _openProfModal(`Environment: ${d.name}`, d.description||'', d.images||[]);
-  } catch(e) { alert('Failed to load environment: ' + e.message); }
+  } catch(e) { showAlert('Failed to load environment: ' + e.message); }
 }
 async function profEnvDelete(name) {
  if (!confirm(`Delete environment profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profEnvLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6528,7 +6528,7 @@ async function deleteCustomPipeline(id) {
    _customPipelines = _customPipelines.filter(p => p.id !== id);
    if (_editingPipelineId === id) { _editingPipelineId = null; _pbSteps = []; renderBuilderSteps(); }
    renderCustomPipelineCards();
-  } catch(e) { alert('Delete failed: '+e.message); }
+  } catch(e) { showAlert('Delete failed: '+e.message); }
 }
 function _renderPipelineResult(outId, progId, d) {
@@ -6683,7 +6683,7 @@ async function archiveDelete(filename) {
    _archiveFiles = _archiveFiles.filter(f => f.filename !== filename);
    renderArchive();
  } catch(e) {
-    alert('Delete failed: ' + e.message);
+    showAlert('Delete failed: ' + e.message);
  }
 }

--- a/codai/admin/templates/models.html
+++ b/codai/admin/templates/models.html
--- a/codai/admin/templates/settings.html
+++ b/codai/admin/templates/settings.html
--- a/codai/admin/templates/tasks.html
+++ b/codai/admin/templates/tasks.html
--- a/codai/admin/templates/tokens.html
+++ b/codai/admin/templates/tokens.html
@@ -126,15 +126,15 @@ async function createToken() {
      openModal('show-modal');
      loadTokens();
    } else {
-      const e = await r.json(); alert(e.detail || 'Failed');
+      const e = await r.json(); showAlert(e.detail || 'Failed');
    }
-  } catch (e) { alert(e.message); }
+  } catch (e) { showAlert(e.message); }
 }
 async function delToken(id) {
  if (!confirm('Delete this token? Clients using it will lose access immediately.')) return;
  const r = await fetch(ROOT_PATH + '/admin/api/tokens/'+id, {method:'DELETE'});
-  if (r.ok) loadTokens(); else alert('Failed to delete');
+  if (r.ok) loadTokens(); else showAlert('Failed to delete');
 }
 loadTokens();

--- a/codai/admin/templates/users.html
+++ b/codai/admin/templates/users.html
@@ -105,7 +105,7 @@ async function delUser(id, name) {
  if (!confirm('Delete user "' + name + '"?')) return;
  const r = await fetch(ROOT_PATH + '/admin/api/users/'+id, {method:'DELETE'});
  if (r.ok) location.reload();
-  else { const e = await r.json(); alert(e.detail || 'Failed'); }
+  else { const e = await r.json(); showAlert(e.detail || 'Failed'); }
 }
 </script>
 {% endblock %}
--- a/codai/api/app.py
+++ b/codai/api/app.py
@@ -160,6 +160,32 @@ except ImportError:
    pass
+class _InternalAuthMiddleware:
+    """Reject any HTTP request that doesn't carry the front's internal token.
+    Active only when CODERAI_INTERNAL_TOKEN is set (i.e. this process is an engine
+    spawned by the front). It binds 127.0.0.1, but this also blocks anything else on
+    localhost from talking to the engine directly and bypassing the front. In
+    single-process mode the token is unset and this is a no-op."""
+    def __init__(self, app):
+        self._app = app
+        self._token = os.environ.get("CODERAI_INTERNAL_TOKEN")
+    async def __call__(self, scope, receive, send):
+        if self._token and scope.get("type") == "http":
+            headers = dict(scope.get("headers", []))
+            got = headers.get(b"x-coderai-internal", b"").decode("latin-1")
+            if got != self._token:
+                await send({"type": "http.response.start", "status": 403,
+                            "headers": [(b"content-type", b"application/json")]})
+                await send({"type": "http.response.body",
+                            "body": b'{"error":"forbidden: engines are reachable only '
+                                    b'through the front proxy"}'})
+                return
+        await self._app(scope, receive, send)
 class _ForwardedPrefixMiddleware:
    """Populate ASGI root_path from X-Forwarded-Prefix / X-Script-Name headers."""
@@ -180,6 +206,9 @@ class _ForwardedPrefixMiddleware:
 app.add_middleware(_ForwardedPrefixMiddleware)
+# Added last → outermost: the internal-token gate runs before anything else, so a
+# request without the front's token never reaches a route.
+app.add_middleware(_InternalAuthMiddleware)
 # Mount static files for admin dashboard
 from fastapi.staticfiles import StaticFiles
@@ -193,6 +222,77 @@ from fastapi.responses import FileResponse, Response as _FaviconResponse
 _favicon_path = admin_static_dir / "favicon.ico"
+@app.get("/healthz", include_in_schema=False)
+async def healthz():
+    """Cheap liveness probe that touches no torch/model state.
+    The front proxy's engine supervisor polls this to distinguish a *slow* engine
+    (busy loading a model — the event loop may be blocked, so this can be late but
+    will eventually answer) from a *dead* one (connection refused). It must stay
+    trivial and dependency-free so it returns the instant the loop is free."""
+    import os as _os
+    return {"ok": True, "pid": _os.getpid()}
+@app.get("/internal/engine-state", include_in_schema=False)
+async def internal_engine_state():
+    """Auth-free engine introspection for the front proxy's router/aggregator.
+    Engines bind 127.0.0.1 only, so this is not publicly reachable. Returns which
+    models are resident (for model→engine routing) and this engine's GPU/VRAM (for
+    cross-engine status aggregation). Kept cheap so it answers even mid-generation.
+    """
+    import os as _os
+    try:
+        loaded = list(multi_model_manager.models.keys())
+    except Exception:
+        loaded = []
+    vram = None
+    try:
+        import torch
+        if torch.cuda.is_available():
+            # Sum across every CUDA device this engine can see — an engine may own
+            # more than one GPU (e.g. two NVIDIA cards sharding one large model), so
+            # reporting only device 0 would under-count its VRAM.
+            n = torch.cuda.device_count()
+            used = free = total = 0
+            devs = []
+            for i in range(n):
+                f, t = torch.cuda.mem_get_info(i)
+                used += (t - f); free += f; total += t
+                devs.append({"index": i, "name": torch.cuda.get_device_name(i),
+                             "free": round(f / 1e9, 2), "total": round(t / 1e9, 2)})
+            label = (torch.cuda.get_device_name(0) if n == 1
+                     else f"{n}× CUDA")
+            vram = {"used": round(used / 1e9, 2), "free": round(free / 1e9, 2),
+                    "total": round(total / 1e9, 2), "gpu": label,
+                    "devices": devs, "device_count": n}
+    except Exception:
+        vram = None
+    # Running tasks so the front can show cross-engine activity without needing a
+    # session on this engine (sessions live only on the primary).
+    tasks = []
+    try:
+        from codai.tasks import task_registry
+        tasks = [t for t in task_registry.list()
+                 if t.get("status") in ("running", "queued", "paused")]
+    except Exception:
+        tasks = []
+    # This engine's thermal cooldown state, so the front can show WHICH engine is
+    # cooling (each engine pauses on its own GPUs; CPU pauses everything).
+    cooling = None
+    try:
+        from codai.models import thermal
+        cs = thermal.get_cooldown_state()
+        if cs.get("active"):
+            cooling = {"gpu": cs.get("gpu"), "cpu": cs.get("cpu"),
+                       "message": cs.get("message")}
+    except Exception:
+        cooling = None
+    return {"ok": True, "pid": _os.getpid(), "loaded_models": loaded,
+            "vram": vram, "tasks": tasks, "cooling": cooling}
 @app.get("/favicon.ico", include_in_schema=False)
 async def favicon():
    if _favicon_path.exists():

--- a/codai/api/ds4_worker.py
+++ b/codai/api/ds4_worker.py
--- a/codai/api/embeddings.py
+++ b/codai/api/embeddings.py
@@ -106,6 +106,27 @@ async def create_embeddings(request: EmbeddingsRequest, http_request: Request =
    """
    OpenAI-compatible embeddings endpoint.
    """
+    # Register a task so embeddings appear in the unified task list, like every
+    # other model type. Finished on success or error below.
+    from codai.tasks import task_registry
+    _title = request.input if isinstance(request.input, str) else "embeddings"
+    _tid = task_registry.register(
+        "embedding", title=str(_title)[:80], model=(request.model or "embedding"))
+    task_registry.start(_tid)
+    try:
+        _resp = await _run_embeddings(request, http_request)
+        task_registry.finish(_tid, "done")
+        return _resp
+    except HTTPException:
+        task_registry.finish(_tid, "error")
+        raise
+    except Exception as e:
+        task_registry.finish(_tid, "error", str(e)[:200])
+        raise
+async def _run_embeddings(request: EmbeddingsRequest, http_request: Request = None):
+    """Core embeddings logic; registered as a task by create_embeddings()."""
    model_info = await asyncio.to_thread(
        multi_model_manager.request_model, request.model, model_type="embedding")
    model_name = model_info.get('model_name')

--- a/codai/api/parler_worker.py
+++ b/codai/api/parler_worker.py
+# CoderAI - OpenAI-compatible API server
+# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+"""Fully-managed Parler-TTS worker.
+parler-tts pins an old transformers/tokenizers/huggingface-hub that conflict with
+the coderai server's stack, so it can't share this venv. Instead coderai owns the
+whole lifecycle here: on first use it bootstraps a dedicated venv (installing
+parler-tts), launches ``tools/parler_tts_service.py`` in it as a local HTTP
+service, health-checks it, and hands back the URL. The matching
+``_RemoteParlerBackend.cleanup()`` calls :func:`stop_service`, so the model
+manager's normal eviction tears the process down — no manual setup or config.
+"""
+import os
+import socket
+import subprocess
+import sys
+import threading
+import time
+from pathlib import Path
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_SERVICE_SCRIPT = _REPO_ROOT / "tools" / "parler_tts_service.py"
+# Dedicated venv for the (incompatible) parler-tts stack. Created with access to
+# the base interpreter's packages so torch/numpy aren't re-downloaded; parler's
+# pinned transformers installs into the venv and shadows the system one.
+_VENV_DIR = Path(os.environ.get("CODERAI_PARLER_VENV")
+                 or os.path.expanduser("~/.coderai/parler_venv"))
+_lock = threading.RLock()
+_services: dict[str, dict] = {}   # model_name -> {"proc","port","url"}
+_bootstrapped = False
+def _venv_python() -> Path:
+    return _VENV_DIR / ("Scripts" if os.name == "nt" else "bin") / (
+        "python.exe" if os.name == "nt" else "python")
+def _pip_ok(py: Path) -> bool:
+    try:
+        return subprocess.run([str(py), "-c", "import parler_tts, soundfile"],
+                              capture_output=True).returncode == 0
+    except Exception:
+        return False
+def _venv_is_system_site() -> bool:
+    """True if the venv was built with --system-site-packages (can't isolate)."""
+    try:
+        return "include-system-site-packages = true" in \
+            (_VENV_DIR / "pyvenv.cfg").read_text().lower()
+    except Exception:
+        return False
+def _bootstrap_venv() -> Path:
+    """Create a fully-isolated venv and install parler-tts (idempotent).
+    Isolation is the whole point: parler-tts pins an old transformers/tokenizers
+    that must NOT be shared with — or shadowed by — the server's stack, so the
+    venv gets its own copy of everything (torch included). Returns its python."""
+    global _bootstrapped
+    py = _venv_python()
+    if _bootstrapped and py.exists():
+        return py
+    # A previously-created shared-site venv leaks the server's transformers in;
+    # rebuild it isolated.
+    if py.exists() and _venv_is_system_site():
+        import shutil
+        print("[parler] rebuilding venv as fully isolated …", flush=True)
+        shutil.rmtree(_VENV_DIR, ignore_errors=True)
+    if not _venv_python().exists():
+        print(f"[parler] creating isolated venv at {_VENV_DIR} …", flush=True)
+        _VENV_DIR.parent.mkdir(parents=True, exist_ok=True)
+        subprocess.run([sys.executable, "-m", "venv", str(_VENV_DIR)], check=True)
+    py = _venv_python()
+    if not _pip_ok(py):
+        print("[parler] installing parler-tts + torch into the isolated venv "
+              "(first run, downloads several GB, this can take a while) …", flush=True)
+        subprocess.run([str(py), "-m", "pip", "install",
+                        "git+https://github.com/huggingface/parler-tts.git",
+                        "soundfile"], check=True)
+        if not _pip_ok(py):
+            raise RuntimeError("parler-tts install did not yield an importable package")
+    _bootstrapped = True
+    return py
+def _free_port() -> int:
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    s.bind(("127.0.0.1", 0))
+    port = s.getsockname()[1]
+    s.close()
+    return port
+def _pump_logs(proc: subprocess.Popen, tail):
+    for line in proc.stdout:
+        line = line.rstrip()
+        if line:
+            tail.append(line)
+            print(f"[parler] {line}", flush=True)
+def _health_ok(url: str) -> bool:
+    import requests
+    try:
+        r = requests.get(url + "/health", timeout=3)
+        return r.ok and bool(r.json().get("ok"))
+    except Exception:
+        return False
+def ensure_service(model_name: str, ready_timeout: float = 1800.0) -> str:
+    """Start (or reuse) the worker for ``model_name`` and return its base URL.
+    First call bootstraps the venv and downloads the model, so the timeout is
+    generous. Raises RuntimeError if the service never comes up."""
+    with _lock:
+        svc = _services.get(model_name)
+        if svc and svc["proc"].poll() is None and _health_ok(svc["url"]):
+            return svc["url"]
+        if svc and svc["proc"].poll() is not None:
+            _services.pop(model_name, None)   # died — restart below
+        py = _bootstrap_venv()
+        port = _free_port()
+        url = f"http://127.0.0.1:{port}"
+        env = dict(os.environ)
+        # The worker must use the model already pulled via coderai's HF download
+        # interface — it never downloads anything itself. Point it at coderai's
+        # cache and force offline mode, so a missing model fails fast instead of
+        # silently fetching.
+        try:
+            from codai.models.cache import get_hf_hub_cache_dir
+            hub = get_hf_hub_cache_dir()
+            env["HF_HUB_CACHE"] = hub
+            env["HUGGINGFACE_HUB_CACHE"] = hub
+        except Exception:
+            pass
+        env["HF_HUB_OFFLINE"] = "1"
+        env["TRANSFORMERS_OFFLINE"] = "1"
+        proc = subprocess.Popen(
+            [str(py), str(_SERVICE_SCRIPT), "--model", model_name,
+             "--host", "127.0.0.1", "--port", str(port)],
+            stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
+            bufsize=1, env=env, cwd=str(_REPO_ROOT),
+        )
+        import collections
+        tail = collections.deque(maxlen=15)
+        threading.Thread(target=_pump_logs, args=(proc, tail), daemon=True).start()
+        _services[model_name] = {"proc": proc, "port": port, "url": url}
+    def _tail_msg():
+        joined = " | ".join(list(tail)[-5:]).strip()
+        if "offline" in joined.lower() or "not" in joined.lower() and "found" in joined.lower():
+            return (f". The model isn't in coderai's cache — download "
+                    f"'{model_name}' from the model interface first. ({joined})")
+        return f". Last output: {joined}" if joined else ""
+    # Wait (outside the lock) for the service to load the model and answer.
+    deadline = time.time() + ready_timeout
+    while time.time() < deadline:
+        if proc.poll() is not None:
+            raise RuntimeError(
+                f"Parler worker exited (code {proc.returncode}) before becoming ready"
+                + _tail_msg())
+        if _health_ok(url):
+            print(f"[parler] service ready for {model_name} at {url}", flush=True)
+            return url
+        time.sleep(2)
+    stop_service(model_name)
+    raise RuntimeError(f"Parler worker for {model_name} did not become ready in time"
+                       + _tail_msg())
+def stop_service(model_name: str) -> None:
+    with _lock:
+        svc = _services.pop(model_name, None)
+    if not svc:
+        return
+    proc = svc["proc"]
+    if proc.poll() is None:
+        try:
+            proc.terminate()
+            proc.wait(timeout=10)
+        except Exception:
+            pass
+    if proc.poll() is None:
+        try:
+            proc.kill()
+        except Exception:
+            pass
+    print(f"[parler] service for {model_name} stopped", flush=True)
+def stop_all() -> None:
+    for name in list(_services.keys()):
+        stop_service(name)
+import atexit as _atexit
+_atexit.register(stop_all)
--- a/codai/api/spatial.py
+++ b/codai/api/spatial.py
@@ -45,6 +45,31 @@ global_args = None
 global_file_path = None
+def _spatial_task(title: str):
+    """Decorator: register a spatial/3D endpoint in the unified task list so
+    every model type is visible there. Finishes done/error around the call."""
+    import functools
+    def deco(fn):
+        @functools.wraps(fn)
+        async def wrap(*args, **kwargs):
+            from codai.tasks import task_registry
+            tid = task_registry.register("spatial", title=title, model="spatial")
+            task_registry.start(tid)
+            try:
+                result = await fn(*args, **kwargs)
+                task_registry.finish(tid, "done")
+                return result
+            except HTTPException:
+                task_registry.finish(tid, "error")
+                raise
+            except Exception as e:
+                task_registry.finish(tid, "error", str(e)[:200])
+                raise
+        return wrap
+    return deco
 def set_global_args(args):
    global global_args
    global_args = args
@@ -500,6 +525,7 @@ class ImageTo3DRequest(BaseModel):
 @router.post("/v1/images/to3d", summary="Image to 3D model")
+@_spatial_task("Image → 3D")
 async def image_to_3d(request: ImageTo3DRequest, http_request: Request = None):
    """Convert a 2D image to a 3D representation.
@@ -568,6 +594,7 @@ class ImageFrom3DRequest(BaseModel):
 @router.post("/v1/images/from3d", summary="Render a 3D model to an image")
+@_spatial_task("3D → image")
 async def image_from_3d(request: ImageFrom3DRequest, http_request: Request = None):
    """Render a 3D model (GLB/OBJ) to a 2D PNG image from a specified camera angle."""
    raw = _decode_b64(request.model_data)
@@ -601,6 +628,7 @@ class VideoTo3DRequest(BaseModel):
 @router.post("/v1/video/to3d", summary="Video to 3D model")
+@_spatial_task("Video → 3D")
 async def video_to_3d(request: VideoTo3DRequest, http_request: Request = None):
    """Convert a 2D video to a 3D video frame-by-frame.
@@ -642,6 +670,7 @@ class VideoFrom3DRequest(BaseModel):
 @router.post("/v1/video/from3d", summary="Render a 3D model to a video")
+@_spatial_task("3D → video")
 async def video_from_3d(request: VideoFrom3DRequest, http_request: Request = None):
    """Render a 3D model as a 360° turntable video."""
    raw = _decode_b64(request.model_data)
@@ -675,6 +704,7 @@ class Generate3DRequest(BaseModel):
 @router.post("/v1/3d/generate", summary="Generate a 3D model from a prompt")
+@_spatial_task("Generate 3D")
 async def generate_3d(request: Generate3DRequest, http_request: Request = None):
    """Generate a 3D model (GLB) from a text prompt and/or an image.

--- a/codai/api/text.py
+++ b/codai/api/text.py
--- a/codai/api/transcriptions.py
+++ b/codai/api/transcriptions.py
@@ -135,6 +135,32 @@ async def create_transcription(
    if len(file_content) > _MAX_AUDIO_BYTES:
        raise HTTPException(status_code=413, detail="Audio file too large (max 100 MB)")
+    # Register a task so transcription appears in the unified task list, like
+    # every other model type. Finished on success or error below.
+    from codai.tasks import task_registry
+    _tid = task_registry.register(
+        "transcription",
+        title=(file.filename or "audio")[:80],
+        model=model or "",
+    )
+    task_registry.start(_tid)
+    try:
+        _resp = await _run_transcription(
+            file_content, model, language, prompt, response_format, temperature, file)
+        task_registry.finish(_tid, "done")
+        return _resp
+    except HTTPException:
+        task_registry.finish(_tid, "error")
+        raise
+    except Exception as e:
+        task_registry.finish(_tid, "error", str(e)[:200])
+        raise
+async def _run_transcription(
+    file_content: bytes, model: str, language, prompt, response_format, temperature, file
+):
+    """Core transcription logic; registered as a task by create_transcription()."""
    # Check if the requested model maps to a configured whisper-server instance first.
    # Try alias round-robin resolution before direct ID lookup.
    whisper_model_id = multi_model_manager.resolve_whisper_alias_model_id(model)

--- a/codai/api/tts.py
+++ b/codai/api/tts.py
@@ -28,6 +28,7 @@ from pydantic import BaseModel, ConfigDict
 # Import from codai modules
 from codai.models.manager import multi_model_manager
+from codai.api import tts_backends
 # Global reference to be set by coderai
@@ -40,6 +41,20 @@ def set_global_args(args):
    global_args = args
+# Substrings that mark a model as a text/classifier/embedding model wrongly routed
+# to TTS (e.g. an emotion classifier exposed under a stray ``tts:`` alias).
+_NON_TTS_HINTS = (
+    "go_emotions", "roberta", "bert", "embedding", "e5-", "minilm",
+    "classifier", "toxic", "reranker", "sentence-transformers",
+)
+def _family_is_text_model(model_name: str) -> bool:
+    """Heuristic guard: True when the model is clearly not a speech synthesizer."""
+    n = (model_name or "").lower()
+    return any(h in n for h in _NON_TTS_HINTS)
 # =============================================================================
 # Router and Endpoints
 # =============================================================================
@@ -72,6 +87,16 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
    Supports:
    - Kokoro TTS models (when --tts-model is specified)
    """
+    # Register a task so TTS shows up in the unified task list / dashboard,
+    # like every other model type. Finished on success or error below.
+    from codai.tasks import task_registry, loading_task
+    _tid = task_registry.register(
+        "tts",
+        title=(request.input or "")[:80],
+        model=(request.model or request.voice_profile or "tts"),
+    )
+    task_registry.start(_tid)
+    try:
        # If a voice profile is requested, delegate to voice cloning (F5-TTS)
        if request.voice_profile:
            from codai.api.voice_clone import _load_voice, _f5tts_clone
@@ -96,6 +121,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
            except Exception as e:
                raise HTTPException(status_code=500, detail=f"Voice cloning failed: {e}")
            audio_base64 = base64.b64encode(audio_bytes).decode('utf-8')
+            task_registry.finish(_tid, "done")
            return {"audio": audio_base64}
        # Use the manager to resolve the model and manage VRAM
@@ -111,7 +137,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
        model_name = model_info['model_name']
        model_key = model_info['model_key']
-    kokoro_model = model_info['model_object']
+        tts_backend = model_info['model_object']
        # If no TTS model configured, return an error
        if not model_name:
@@ -120,35 +146,42 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
                detail="TTS not configured. Use --tts-model to specify a model."
            )
-    # Try to use kokoro if available
+        # Reject text/classifier models that aren't actually speech synthesizers.
-    try:
+        if _family_is_text_model(model_name):
-        from kokoro import Kokoro
+            raise HTTPException(
+                status_code=404,
+                detail=(f"Model '{model_name}' is a text model and cannot be used for "
+                        "tts generation. Use a TTS model (e.g. a kokoro/XTTS/Bark model).")
+            )
-        if kokoro_model is None:
+        try:
-            print(f"Loading Kokoro TTS model: {model_name}")
+            from codai.api import tts_backends
-            # Check if model_name is a URL - download it (with caching)
+            if tts_backend is None:
-            model_path = None
+                print(f"Loading TTS model: {model_name}")
-            if model_name.startswith('http://') or model_name.startswith('https://'):
-                print(f"Loading model from URL: {model_name}")
-                from codai.models.cache import load_model
-                model_path = load_model(model_name)
-                if not model_path:
-                    raise Exception(f"Failed to load model from {model_name}")
-            else:
-                # Use local path or model name
                model_path = model_name
+                if model_name.startswith(('http://', 'https://')):
-            # Load the Kokoro model
+                    from codai.models.cache import load_model
-            kokoro_model = Kokoro(model_path if model_path else model_name)
+                    model_path = load_model(model_name) or model_name
-            multi_model_manager.add_model(model_key, kokoro_model)
+                cfg = multi_model_manager.config.get(model_key) or \
+                    multi_model_manager.config.get(f"tts:{model_name}") or {}
+                with loading_task(model_name, model_type="tts"):
+                    tts_backend = await asyncio.to_thread(
+                        tts_backends.load_backend, model_name, model_path, cfg)
+                multi_model_manager.add_model(model_key, tts_backend)
                multi_model_manager.current_model_key = model_key
-        # Generate speech
+            voice = request.voice or getattr(tts_backend, "default_voice", "")
-        voice = request.voice or "af_sarah"
            speed = request.speed or 1.0
+            lang = getattr(request, "language", None) or "en-us"
+            emotion = getattr(request, "emotion", None) or ""
+            style = getattr(request, "style", None) or ""
+            fmt = request.response_format or "wav"
-        audio_bytes = kokoro_model.generate(request.input, voice=voice, speed=speed)
+            samples, sample_rate = await asyncio.to_thread(
+                tts_backend.synthesize, request.input, voice, speed, lang, emotion, style)
+            audio_bytes, out_fmt = await asyncio.to_thread(
+                tts_backends.encode_audio, samples, sample_rate, fmt)
            try:
                from codai.api.archive import archive_manager
@@ -157,27 +190,29 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
                    "tts", "/v1/audio/speech",
                    model_name,
                    request.input,
-                {"voice": voice, "speed": speed, "response_format": request.response_format},
+                    {"voice": voice, "speed": speed, "response_format": out_fmt},
-                [(audio_bytes, request.response_format or "mp3")],
+                    [(audio_bytes, out_fmt)],
                ))
            except Exception:
                pass
-        # Convert to base64
            audio_base64 = base64.b64encode(audio_bytes).decode('utf-8')
+            task_registry.finish(_tid, "done")
+            return {"audio": audio_base64}
-        return {
+        except HTTPException:
-            "audio": audio_base64
+            raise
-        }
+        except tts_backends.MissingEngineError as e:
+            # Missing optional engine (e.g. coqui-tts) → actionable 501.
-    except ImportError as e:
+            raise HTTPException(status_code=501, detail=str(e))
-        # kokoro not installed
-        raise HTTPException(
-            status_code=501,
-            detail=f"TTS not available. Install kokoro: pip install kokoro. Error: {str(e)}"
-        )
        except Exception as e:
            print(f"TTS error: {e}")
            import traceback
            traceback.print_exc()
            raise HTTPException(status_code=500, detail=f"TTS error: {str(e)}")
+    except HTTPException:
+        task_registry.finish(_tid, "error")
+        raise
+    except Exception as e:
+        task_registry.finish(_tid, "error", str(e)[:200])
+        raise
\ No newline at end of file
--- a/codai/api/tts_backends.py
+++ b/codai/api/tts_backends.py
--- a/codai/backends/cuda.py
+++ b/codai/backends/cuda.py
--- a/codai/backends/ds4.py
+++ b/codai/backends/ds4.py
--- a/codai/backends/vulkan.py
+++ b/codai/backends/vulkan.py
--- a/codai/broker/capabilities.py
+++ b/codai/broker/capabilities.py
@@ -49,7 +49,13 @@ def build_hardware_summary() -> Dict[str, Any]:
    total_vram_mb = 0
    available_vram_mb = 0
+    # Only use torch if it's ALREADY loaded (i.e. we're in an engine). Never import
+    # it here — the front is torch-free and must stay that way (importing torch in
+    # the front is heavy and would initialise CUDA in the wrong process).
+    import sys as _sys
    try:
+        if "torch" not in _sys.modules:
+            raise ImportError("torch not loaded (front) — using torch-free path")
        import torch
        if torch.cuda.is_available():
@@ -76,6 +82,23 @@ def build_hardware_summary() -> Dict[str, Any]:
    except Exception:
        pass
+    # Torch-free path (e.g. the front, which imports no torch): enumerate every
+    # physical card via nvidia-smi + sysfs so VRAM is reported for the whole node.
+    if not gpus:
+        try:
+            from codai.frontproxy.gpu_detect import gpu_stats
+            for c in gpu_stats():
+                total_mb = int(round((c.get("mem_total") or 0) * 1024))
+                used_mb = int(round((c.get("mem_used") or 0) * 1024))
+                if total_mb <= 0:
+                    continue
+                gpus.append({"name": c.get("name") or c.get("vendor"),
+                             "total_vram_mb": total_mb})
+                total_vram_mb += total_mb
+                available_vram_mb += max(0, total_mb - used_mb)
+        except Exception:
+            pass
    if not gpus:
        for total_path in sorted(glob.glob("/sys/class/drm/card*/device/mem_info_vram_total")):
            used_path = total_path.replace("vram_total", "vram_used")

--- a/codai/broker/dispatcher.py
+++ b/codai/broker/dispatcher.py
@@ -60,8 +60,13 @@ def _is_text_response(content_type: str | None) -> bool:
    )
-async def execute_broker_request(app, envelope):
+async def execute_broker_request(app, envelope, executor=None):
-    """Validate and execute a broker request envelope."""
+    """Validate and execute a broker request envelope.
+    ``executor`` is an ``async (method, path, headers, query, body) -> {status_code,
+    headers, body}`` callable. When omitted the request is run in-process against
+    ``app`` via the ASGI bridge (engine / single-process mode). The front passes its
+    own executor that proxies to the right engine over HTTP."""
    logger.debug(
        "broker dispatch → op=%s request_id=%s path=%r method=%r stream=%s",
@@ -136,6 +141,12 @@ async def execute_broker_request(app, envelope):
        headers["content-type"] = envelope.content_type
    started_at = perf_counter()
+    if executor is not None:
+        response = await executor(
+            method=envelope.method, path=envelope.path, headers=headers,
+            query=envelope.query, body=body,
+        )
+    else:
        response = await execute_internal_request(
            app,
            method=envelope.method,

--- a/codai/cli.py
+++ b/codai/cli.py
--- a/codai/config.py
+++ b/codai/config.py
--- a/codai/frontproxy/__init__.py
+++ b/codai/frontproxy/__init__.py
--- a/codai/frontproxy/app.py
+++ b/codai/frontproxy/app.py
--- a/codai/frontproxy/assignment.py
+++ b/codai/frontproxy/assignment.py
--- a/codai/frontproxy/engine_supervisor.py
+++ b/codai/frontproxy/engine_supervisor.py
--- a/codai/frontproxy/gpu_detect.py
+++ b/codai/frontproxy/gpu_detect.py
--- a/codai/frontproxy/registry.py
+++ b/codai/frontproxy/registry.py
--- a/codai/frontproxy/router.py
+++ b/codai/frontproxy/router.py
--- a/codai/main.py
+++ b/codai/main.py
--- a/codai/models/capabilities.py
+++ b/codai/models/capabilities.py
--- a/codai/models/manager.py
+++ b/codai/models/manager.py
--- a/codai/models/parser.py
+++ b/codai/models/parser.py
--- a/codai/models/quant.py
+++ b/codai/models/quant.py
--- a/codai/models/ram_monitor.py
+++ b/codai/models/ram_monitor.py
--- a/codai/models/thermal.py
+++ b/codai/models/thermal.py
--- a/codai/models/tmp_janitor.py
+++ b/codai/models/tmp_janitor.py
--- a/codai/tasks/registry.py
+++ b/codai/tasks/registry.py
@@ -54,6 +54,7 @@ class Task:
    status: str = "queued"         # queued | running | done | error | cancelled
    step: int = 0
    total: int = 0
+    rate: float = 0.0              # throughput (tokens/s) for text generation
    message: str = ""
    job_id: Optional[str] = None   # link to a durable loras training job, if any
    created_at: float = field(default_factory=time.time)

--- a/commands
+++ b/commands
+python tools/video_editor.py --no-browser --host 0.0.0.0 --media-dir tools/coderai_media --session
+tools/gen_township_fighters.py -c township_output/township_config.json
--- a/docs/deepseek-ds4.md
+++ b/docs/deepseek-ds4.md
--- a/docs/expressive-tts.md
+++ b/docs/expressive-tts.md
--- a/docs/frontend-engine-split.md
+++ b/docs/frontend-engine-split.md
--- a/docs/process-isolation-plans.md
+++ b/docs/process-isolation-plans.md
--- a/docs/reverse-proxy-nginx.md
+++ b/docs/reverse-proxy-nginx.md
--- a/packaging/linux/Dockerfile.oci
+++ b/packaging/linux/Dockerfile.oci
--- a/packaging/linux/Dockerfile.oci-venv
+++ b/packaging/linux/Dockerfile.oci-venv
--- a/packaging/linux/Dockerfile.update
+++ b/packaging/linux/Dockerfile.update
--- a/packaging/linux/README-RUN.txt
+++ b/packaging/linux/README-RUN.txt
--- a/packaging/linux/build_oci_image.sh
+++ b/packaging/linux/build_oci_image.sh
--- a/packaging/linux/launcher/coderai-entrypoint
+++ b/packaging/linux/launcher/coderai-entrypoint
--- a/packaging/linux/launcher/coderai-oci
+++ b/packaging/linux/launcher/coderai-oci
--- a/packaging/linux/launcher/sadtalker
+++ b/packaging/linux/launcher/sadtalker
--- a/packaging/linux/launcher/wav2lip
+++ b/packaging/linux/launcher/wav2lip
--- a/packaging/linux/launcher/with-env
+++ b/packaging/linux/launcher/with-env
--- a/packaging/linux/nginx.conf
+++ b/packaging/linux/nginx.conf
--- a/packaging/linux/run_oci.sh
+++ b/packaging/linux/run_oci.sh
--- a/packaging/linux/smoke_test_services.sh
+++ b/packaging/linux/smoke_test_services.sh
--- a/packaging/linux/supervisord.conf
+++ b/packaging/linux/supervisord.conf
--- a/packaging/linux/update_oci_image.sh
+++ b/packaging/linux/update_oci_image.sh
--- a/requirements-nvidia.txt
+++ b/requirements-nvidia.txt
--- a/tools/gen_township_fighters.py
+++ b/tools/gen_township_fighters.py
--- a/tools/parler_tts_service.py
+++ b/tools/parler_tts_service.py
--- a/tools/video_editor.py
+++ b/tools/video_editor.py
--- a/tools/videogen.py
+++ b/tools/videogen.py
--- a/video_editor.config.json
+++ b/video_editor.config.json