feat: model "to-download" list, mmproj vision, styled modals, broker + packaging

Web UI / models: - "To download" wishlist: models known but not on disk and not configured show as non-configured to-download rows. Free-disk on an unconfigured model, Remove on a model with no files left, and a new "Add to list" button in the download window all record into models.json `to_download`; pruned on enable/download. New endpoints model-mark-download / model-unmark-download. - mmproj multimodal components: mmproj GGUFs are classified as components (not models), selectable per-GGUF in the model config (auto-selected, enables vision capability). VulkanBackend loads them via llama.cpp's MTMDChatHandler (--mmproj equivalent), and the chat path now forwards image_url content end-to-end. - All window.alert() replaced by a shared styled showAlert()/showConfirm() modal in base.html (used across every admin template). Front proxy / broker: - Fix engine model-assignment NameError (keep -> _keep). - Brokered GET /coderai/capabilities now answers from the front (whole node) so multi-GPU hosts report every card, not a single engine's CUDA-visible one. - Log a clear reason when the broker is disabled. Packaging (distributable OCI image): - Multi-stage venv image + smoke test; bundle ds4/wav2lip/sadtalker + parler; whisper-server etc. dereferenced (cp -aL) so no dangling symlinks. - Dockerfile.update + update_oci_image.sh: ~30s incremental code-only rebuild on an immutable coderai:base (no 20GB bundle recopy). - run_oci.sh: --local/--config-dir + --map to run against existing local config and data dirs without a rebuild; --debug[=flags] + --log-file for selectable debug flags and a host-tailable file log (launcher tees; supervisord kills the process group). tmp_janitor age-prunes the dedicated temp dir. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat: model "to-download" list, mmproj vision, styled modals, broker + packaging
Web UI / models: - "To download" wishlist: models known but not on disk and not configured show as non-configured to-download rows. Free-disk on an unconfigured model, Remove on a model with no files left, and a new "Add to list" button in the download window all record into models.json `to_download`; pruned on enable/download. New endpoints model-mark-download / model-unmark-download. - mmproj multimodal components: mmproj GGUFs are classified as components (not models), selectable per-GGUF in the model config (auto-selected, enables vision capability). VulkanBackend loads them via llama.cpp's MTMDChatHandler (--mmproj equivalent), and the chat path now forwards image_url content end-to-end. - All window.alert() replaced by a shared styled showAlert()/showConfirm() modal in base.html (used across every admin template). Front proxy / broker: - Fix engine model-assignment NameError (keep -> _keep). - Brokered GET /coderai/capabilities now answers from the front (whole node) so multi-GPU hosts report every card, not a single engine's CUDA-visible one. - Log a clear reason when the broker is disabled. Packaging (distributable OCI image): - Multi-stage venv image + smoke test; bundle ds4/wav2lip/sadtalker + parler; whisper-server etc. dereferenced (cp -aL) so no dangling symlinks. - Dockerfile.update + update_oci_image.sh: ~30s incremental code-only rebuild on an immutable coderai:base (no 20GB bundle recopy). - run_oci.sh: --local/--config-dir + --map to run against existing local config and data dirs without a rebuild; --debug[=flags] + --log-file for selectable debug flags and a host-tailable file log (launcher tees; supervisord kills the process group). tmp_janitor age-prunes the dedicated temp dir. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cbf7f147 · Stefy Lanza (nextime / spora ) · 9d023ec2 · cbf7f147 · cbf7f147 · cbf7f147
Commit cbf7f147 authored Jun 19, 2026 by Stefy Lanza (nextime / spora )
30 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -43,6 +43,10 @@ township_output/
 .packaging-cache/
 tmp/
+# Exported image tarballs + local OCI run-state (large artifacts)
+dist/
+coderai-runtime/
 # Video editor sessions + generated media (runtime artifacts)
 video_editor/sessions/
 tools/coderai_media/
--- a/AI.PROMPT
+++ b/AI.PROMPT
@@ -286,3 +286,67 @@ safe.
 14. Thermal protection is config-driven and model-agnostic (config.json
    `thermal`). Don't special-case it per model/backend; it only reads temps and
    sleeps. Honour the enable flags and high/resume hysteresis.
+================================================================================
+## Distributable Docker image (packaging/linux)
+================================================================================
+All-in-one image: coderai + tools (editor/videogen/township) behind nginx on a
+single port (8776), built from the LOCAL install's venv + binaries.
+Multi-stage `Dockerfile.oci-venv`:
+  - assembler stage stages the local bundle into /opt/coderai (python-build-
+    standalone interpreter + venv site-packages + ldd'd native libs + parler
+    overlay + lip-sync venv/repos + py310 + ds4). The ~20 GB bundle COPY lives
+    ONLY here; the runtime stage COPYs the assembled tree ONCE (no double-store).
+  - runtime stage: apt (nginx/supervisor/vulkan-tools/ffmpeg/...), COPY the
+    assembled /opt/coderai, then COPY app code → /opt/coderai/app, launchers →
+    /usr/local/bin, nginx/supervisor confs. Entry = coderai-entrypoint →
+    supervisord (nginx + main server + tool UIs).
+  - Do NOT set PYTHONHOME globally (breaks the system-python supervisord); set
+    PATH only. Bundle dereferences host symlinks (cp -aL) so binaries like
+    whisper-server are real files in the image, not dangling links.
+Full build (slow, ~15 min — rebuilds the bundle):
+  packaging/linux/build_oci_image.sh                      # tags coderai:dist
+Smoke test (no weights, checks services + every bundled binary):
+  DOCKER="sudo docker" GPU="--gpus all" PORT=18082 \
+    packaging/linux/smoke_test_services.sh coderai:dist
+Run against your LIVE local config + data (no rebuild — pure bind-mounts):
+  packaging/linux/run_oci.sh --nvidia --local \
+    --map /AI/guffcache --map /AI/huggingface --map /AI/offloads
+  - The image launcher reads config from /config/coderai and runs
+    `coderai --config /config/coderai`, rewriting server.host/port in config.json.
+  - `--local` (= --config-dir ~/.coderai) copies ONLY the *.json config files to
+    a temp dir and mounts it at /config/coderai, so your real config is untouched
+    (use --inplace-config to edit it directly).
+  - `--map HOST[:CONT]` bind-mounts a host dir at the SAME path inside the
+    container so the ABSOLUTE paths in models.json/config.json (gguf/hf caches,
+    offloads) resolve unchanged. Without these maps the models won't be found.
+  - `--debug[=SPEC]` runs coderai with --debug* flags (SPEC default 'all';
+    e.g. `--debug=engine,requests,ws` → --debug-engine/--debug-requests/--debug-ws,
+    `--debug` always auto-added) and writes a host-tailable file log. `--log-file
+    PATH` sets the in-container log path (default /cache/logs/coderai.log → host
+    under the cache mount). Driven by env CODERAI_DEBUG + CODERAI_LOG_FILE, read
+    by the coderai-oci launcher, which tees output so `docker logs` still works.
+    supervisord [program:coderai] uses stopasgroup/killasgroup so the front's
+    engine subprocesses + the tee are torn down together. NOTE: the launcher +
+    supervisord.conf are baked in, so changes need a (fast) update_oci_image.sh.
+Incremental update (FAST, ~30 s — code-only changes, NO bundle recopy):
+  DOCKER="sudo docker" packaging/linux/update_oci_image.sh
+  - `Dockerfile.update` is `FROM coderai:base` and re-layers ONLY the app code +
+    launchers + service confs. The heavy bundle layers are inherited unchanged.
+  - Keeps an immutable `coderai:base` (the bundle) and rebuilds `coderai:dist`
+    as base + a thin app layer. Every update starts from the SAME base, so app
+    layers never stack across updates. dist and base SHARE the bundle layers —
+    keeping both costs only the app layer (a few MB), not a second 23 GB.
+  - First run seeds coderai:base from the current coderai:dist (docker tag).
+  - Re-baseline the bundle (new venv/libs/tools): run build_oci_image.sh, then
+    `docker rmi coderai:base` so the next update re-seeds it from the new dist.
+  - Use this whenever ONLY codai/ app code (or launchers/confs) changed — a full
+    build_oci_image.sh is wasteful for that.
+  - CAUTION: COPY adds/overwrites but does NOT delete files removed from the
+    repo; the cleanup RUN prunes only known-stale paths (.git/venv*/dist/...). A
+    source file deleted from codai/ lingers in the overlay until a full rebuild.
--- a/codai/admin/routes.py
+++ b/codai/admin/routes.py
@@ -980,6 +980,14 @@ async def api_download_model(
    if existing:
        return {"session_id": existing, "attached": True}
+    # A download supersedes any "to download" wishlist entry for this model.
+    if config_manager is not None:
+        changed = _prune_to_download(model_id)
+        if file_pattern:
+            changed = _prune_to_download(file_pattern) or changed
+        if changed:
+            config_manager.save_models()
    session_id = str(_uuid.uuid4())
    pq = _q.Queue()
    _download_sessions[session_id] = pq
@@ -1170,6 +1178,58 @@ def _hf_repo_id_from_path(path: str) -> str:
    return ''
+# Categories that hold real (configured) models in models.json.
+_VALID_MODEL_CATS = {
+    "text_models", "image_models", "audio_models", "gguf_models", "tts_models",
+    "vision_models", "video_models", "audio_gen_models", "embedding_models",
+    "spatial_models",
+}
+def _entry_key(entry) -> str:
+    """The identifying path/id of a models.json entry (str or dict)."""
+    if isinstance(entry, str):
+        return entry
+    if isinstance(entry, dict):
+        return entry.get("path") or entry.get("id") or ""
+    return ""
+def _basename_key(key: str) -> str:
+    import os as _os
+    return _os.path.basename(key) if ("/" in key or _os.sep in key) else key
+def _is_model_configured(model_id: str) -> bool:
+    """True if model_id is already a configured model (matched by id or basename)."""
+    if config_manager is None:
+        return False
+    fname = _basename_key(model_id)
+    for cat in _VALID_MODEL_CATS:
+        for m in config_manager.models_data.get(cat, []):
+            key = _entry_key(m)
+            if key == model_id or (fname and _basename_key(key) == fname):
+                return True
+    return False
+def _prune_to_download(model_id: str) -> bool:
+    """Drop any 'to download' wishlist entry matching model_id. Returns True if changed."""
+    if config_manager is None:
+        return False
+    lst = config_manager.models_data.get("to_download")
+    if not lst:
+        return False
+    fname = _basename_key(model_id)
+    kept = [e for e in lst
+            if not (_entry_key(e) == model_id
+                    or (fname and _basename_key(_entry_key(e)) == fname))]
+    if len(kept) != len(lst):
+        config_manager.models_data["to_download"] = kept
+        return True
+    return False
 def _scan_caches() -> dict:
    import os
    result: dict = {"hf": [], "gguf": []}
@@ -1451,6 +1511,49 @@ def _scan_caches() -> dict:
            "configs": all_configs.get(path, []),
        })
+    # Surface "to download" wishlist entries: models the user wants listed for
+    # later download but has NOT configured and are NOT on disk. They appear as
+    # non-configured rows with a download button (in_config=False, missing=True).
+    seen_gguf = {m["path"] for m in result["gguf"]} | {m["filename"] for m in result["gguf"]}
+    seen_hf = {m["id"] for m in result["hf"]}
+    if config_manager:
+        for entry in config_manager.models_data.get("to_download", []):
+            e = entry if isinstance(entry, dict) else {"path": entry}
+            mid = (e.get("path") or e.get("id") or "").strip()
+            if not mid or _is_model_configured(mid):
+                continue
+            repo = e.get("source_repo") or mid
+            mtype = e.get("model_type") or "text_models"
+            is_gguf = (bool(e.get("is_gguf")) or mid.lower().endswith(".gguf")
+                       or "gguf" in mid.lower() or mtype == "gguf_models")
+            fname = os.path.basename(mid) if ("/" in mid or os.sep in mid) else mid
+            caps = e.get("capabilities") or detect_model_capabilities(mid).to_list()
+            if is_gguf:
+                if mid in seen_gguf or fname in seen_gguf:
+                    continue
+                result["gguf"].append({
+                    "filename": fname, "path": mid,
+                    "size_gb": 0, "size_bytes": 0,
+                    "in_config": False, "missing": True, "to_download": True,
+                    "source_repo": repo,
+                    "model_type": mtype if mtype != "gguf_models" else "text_models",
+                    "settings": {}, "capabilities": caps,
+                    "incomplete": False, "configs": [],
+                })
+                seen_gguf.add(mid); seen_gguf.add(fname)
+            else:
+                if mid in seen_hf:
+                    continue
+                result["hf"].append({
+                    "id": mid, "size_gb": 0, "size_bytes": 0, "revision_count": 0,
+                    "files": [], "file_count": 0,
+                    "in_config": False, "missing": True, "to_download": True,
+                    "source_repo": repo, "model_type": mtype,
+                    "settings": {}, "capabilities": caps,
+                    "incomplete": False, "configs": [],
+                })
+                seen_hf.add(mid)
    return result
@@ -1729,6 +1832,60 @@ async def api_model_add_known(request: Request, username: str = Depends(require_
                return {"success": True, "already": True}
    config_manager.models_data.setdefault(model_type, []).append(entry)
+    _prune_to_download(model_id)
+    config_manager.save_models()
+    _broker_notify_models_updated(request)
+    return {"success": True}
+@router.post("/admin/api/model-mark-download", summary="List a model for later download")
+async def api_model_mark_download(request: Request, username: str = Depends(require_admin)):
+    """Record a model in the 'to download' wishlist: it appears in the model list
+    as a non-configured, to-be-downloaded entry (no files fetched, no serving
+    config created). Used by 'Free disk' on unconfigured models, 'Remove' on a
+    model with no files left, and 'Add to list' in the download window."""
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail="Config manager not initialized")
+    data = await request.json()
+    model_id = (data.get("model_id") or data.get("path") or "").strip()
+    if not model_id:
+        raise HTTPException(status_code=400, detail="model_id is required")
+    source_repo = (data.get("source_repo") or model_id).strip()
+    model_type = (data.get("model_type") or "").strip()
+    is_gguf = (bool(data.get("is_gguf")) or model_type == "gguf_models"
+               or model_id.lower().endswith(".gguf") or "gguf" in model_id.lower())
+    if is_gguf:
+        model_type = "gguf_models"
+    if model_type not in _VALID_MODEL_CATS:
+        model_type = "text_models"
+    # Already a real (configured) model — nothing to add.
+    if _is_model_configured(model_id):
+        return {"success": True, "already_configured": True}
+    import os as _os
+    lst = config_manager.models_data.setdefault("to_download", [])
+    fname = _basename_key(model_id)
+    for e in lst:
+        k = _entry_key(e)
+        if k == model_id or (fname and _basename_key(k) == fname):
+            return {"success": True, "already": True}
+    lst.append({"path": model_id, "source_repo": source_repo,
+                "model_type": model_type, "is_gguf": is_gguf})
+    config_manager.save_models()
+    _broker_notify_models_updated(request)
+    return {"success": True}
+@router.post("/admin/api/model-unmark-download", summary="Remove a model from the download list")
+async def api_model_unmark_download(request: Request, username: str = Depends(require_admin)):
+    """Drop a model from the 'to download' wishlist (the user no longer wants it
+    listed). Has no effect on configured models or files on disk."""
+    if config_manager is None:
+        raise HTTPException(status_code=503, detail="Config manager not initialized")
+    data = await request.json()
+    model_id = (data.get("model_id") or data.get("path") or "").strip()
+    if not model_id:
+        raise HTTPException(status_code=400, detail="model_id is required")
+    if _prune_to_download(model_id):
        config_manager.save_models()
        _broker_notify_models_updated(request)
    return {"success": True}
@@ -1747,8 +1904,13 @@ async def api_model_enable(request: Request, username: str = Depends(require_adm
    if model_type not in valid:
        raise HTTPException(status_code=400, detail=f"model_type must be one of {valid}")
    lst = config_manager.models_data.setdefault(model_type, [])
+    changed = False
    if path not in lst:
        lst.append(path)
+        changed = True
+    if _prune_to_download(path):
+        changed = True
+    if changed:
        config_manager.save_models()
    _broker_notify_models_updated(request)
    return {"success": True}
@@ -2285,7 +2447,7 @@ async def api_model_configure(request: Request, username: str = Depends(require_
                "component_quantization", "output_crf", "force_vram_update",
                "balanced_gpu_percent", "acceleration",
                "cache_type_k", "cache_type_v", "turboquant", "engine",
-                "quant_backend", "kv_cache_budget_mb", "kv_cache_slots"):
+                "quant_backend", "kv_cache_budget_mb", "kv_cache_slots", "mmproj"):
        if key in data:
            entry[key] = data[key]

--- a/codai/admin/templates/archive.html
+++ b/codai/admin/templates/archive.html
@@ -335,7 +335,7 @@ async function deleteEntry() {
    closeDetail();
    loadArchive();
  } catch(e) {
-    alert('Delete failed: ' + e.message);
+    showAlert('Delete failed: ' + e.message);
  }
 }

--- a/codai/admin/templates/base.html
+++ b/codai/admin/templates/base.html
@@ -104,6 +104,81 @@ function donateCopy(id, btn) {
 </main>
 {% endif %}
+<!-- Shared confirm / notice modal (replaces window.confirm / window.alert) -->
+<div id="confirm-modal" class="modal" onclick="if(event.target===this)document.getElementById('confirm-modal-cancel').click()">
+  <div class="modal-box" style="max-width:420px">
+    <div class="modal-head">
+      <span class="modal-title" id="confirm-modal-title">Confirm</span>
+      <button class="modal-close" id="confirm-modal-x">&times;</button>
+    </div>
+    <div class="modal-body">
+      <p id="confirm-modal-msg" style="margin:0 0 1.25rem;white-space:pre-wrap"></p>
+      <div style="display:flex;gap:.5rem;justify-content:flex-end">
+        <button class="btn btn-ghost" id="confirm-modal-cancel">Cancel</button>
+        <button class="btn btn-danger" id="confirm-modal-ok">Confirm</button>
+      </div>
+    </div>
+  </div>
+</div>
+<script>
+// Global modal helpers, shared by every admin page. Defined here so templates
+// can call showAlert()/showConfirm() instead of window.alert()/window.confirm().
+if(typeof window.openModal!=='function') window.openModal=function(id){document.getElementById(id).classList.add('show')};
+if(typeof window.closeModal!=='function') window.closeModal=function(id){document.getElementById(id).classList.remove('show')};
+window.showConfirm=function(title, msg, okLabel){
+  return new Promise(resolve => {
+    document.getElementById('confirm-modal-title').textContent = title;
+    document.getElementById('confirm-modal-msg').textContent = msg;
+    const okBtn    = document.getElementById('confirm-modal-ok');
+    const cancelBtn= document.getElementById('confirm-modal-cancel');
+    const xBtn     = document.getElementById('confirm-modal-x');
+    okBtn.className = 'btn btn-danger';
+    okBtn.textContent = okLabel || 'Confirm';
+    cancelBtn.style.display = '';
+    openModal('confirm-modal');
+    function cleanup(result){
+      closeModal('confirm-modal');
+      okBtn.removeEventListener('click', onOk);
+      cancelBtn.removeEventListener('click', onCancel);
+      xBtn.removeEventListener('click', onCancel);
+      resolve(result);
+    }
+    function onOk(){ cleanup(true); }
+    function onCancel(){ cleanup(false); }
+    okBtn.addEventListener('click', onOk);
+    cancelBtn.addEventListener('click', onCancel);
+    xBtn.addEventListener('click', onCancel);
+  });
+};
+// Styled replacement for window.alert(): a single-button notice modal.
+window.showAlert=function(msg, title, kind){
+  return new Promise(resolve => {
+    if(!title && !kind && /^\s*(error|failed|cannot|could not)\b/i.test(String(msg||''))) kind = 'error';
+    document.getElementById('confirm-modal-title').textContent =
+      title || (kind === 'error' ? 'Error' : 'Notice');
+    document.getElementById('confirm-modal-msg').textContent = msg;
+    const okBtn     = document.getElementById('confirm-modal-ok');
+    const cancelBtn = document.getElementById('confirm-modal-cancel');
+    const xBtn      = document.getElementById('confirm-modal-x');
+    okBtn.className = 'btn btn-primary';
+    okBtn.textContent = 'OK';
+    cancelBtn.style.display = 'none';
+    openModal('confirm-modal');
+    function cleanup(){
+      closeModal('confirm-modal');
+      cancelBtn.style.display = '';
+      okBtn.removeEventListener('click', onOk);
+      xBtn.removeEventListener('click', onOk);
+      resolve();
+    }
+    function onOk(){ cleanup(); }
+    okBtn.addEventListener('click', onOk);
+    xBtn.addEventListener('click', onOk);
+  });
+};
+</script>
 {% block scripts %}{% endblock %}
 </body>
 </html>
--- a/codai/admin/templates/chat.html
+++ b/codai/admin/templates/chat.html
@@ -4229,12 +4229,12 @@ async function loadCharProfileIntoSlot(prefix, idx, name) {
    charSlots[prefix][idx].name = charSlots[prefix][idx].name || d.name;
    charSlots[prefix][idx].images = (d.images||[]).map(img => img.data);
    renderCharSlots(prefix);
-  } catch(e) { alert('Failed to load profile: '+e.message); }
+  } catch(e) { showAlert('Failed to load profile: '+e.message); }
 }
 async function saveCharSlotAsProfile(prefix, idx) {
  const slot = charSlots[prefix]?.[idx];
-  if (!slot || !slot.images.length) { alert('Add at least one image first.'); return; }
+  if (!slot || !slot.images.length) { showAlert('Add at least one image first.'); return; }
  const name = slot.name || prompt('Profile name:');
  if (!name) return;
  try {
@@ -4246,8 +4246,8 @@ async function saveCharSlotAsProfile(prefix, idx) {
    charSlots[prefix][idx].name = name;
    await loadCharProfileList();
    renderCharSlots(prefix);
-    alert(`Saved profile "${name}"`);
+    showAlert(`Saved profile "${name}"`);
-  } catch(e) { alert('Save failed: '+e.message); }
+  } catch(e) { showAlert('Save failed: '+e.message); }
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6051,14 +6051,14 @@ async function profCharView(name) {
  try {
    const d = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name)).then(r=>r.json());
    _openProfModal(`Character: ${d.name}`, d.description||'', d.images||[]);
-  } catch(e) { alert('Failed to load character: ' + e.message); }
+  } catch(e) { showAlert('Failed to load character: ' + e.message); }
 }
 async function profCharDelete(name) {
  if (!confirm(`Delete character profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profCharLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
@@ -6139,7 +6139,7 @@ async function profVoiceDelete(name) {
  if (!confirm(`Delete voice profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/voices/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profVoiceLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6296,14 +6296,14 @@ async function profEnvView(name) {
  try {
    const d = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name)).then(r=>r.json());
    _openProfModal(`Environment: ${d.name}`, d.description||'', d.images||[]);
-  } catch(e) { alert('Failed to load environment: ' + e.message); }
+  } catch(e) { showAlert('Failed to load environment: ' + e.message); }
 }
 async function profEnvDelete(name) {
  if (!confirm(`Delete environment profile "${name}"?`)) return;
  const r = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name), {method:'DELETE'});
  if (r.ok) await profEnvLoad();
-  else alert('Delete failed: ' + await r.text());
+  else showAlert('Delete failed: ' + await r.text());
 }
 // ─────────────────────────────────────────────────────────────────
@@ -6528,7 +6528,7 @@ async function deleteCustomPipeline(id) {
    _customPipelines = _customPipelines.filter(p => p.id !== id);
    if (_editingPipelineId === id) { _editingPipelineId = null; _pbSteps = []; renderBuilderSteps(); }
    renderCustomPipelineCards();
-  } catch(e) { alert('Delete failed: '+e.message); }
+  } catch(e) { showAlert('Delete failed: '+e.message); }
 }
 function _renderPipelineResult(outId, progId, d) {
@@ -6683,7 +6683,7 @@ async function archiveDelete(filename) {
    _archiveFiles = _archiveFiles.filter(f => f.filename !== filename);
    renderArchive();
  } catch(e) {
-    alert('Delete failed: ' + e.message);
+    showAlert('Delete failed: ' + e.message);
  }
 }

--- a/codai/admin/templates/models.html
+++ b/codai/admin/templates/models.html
--- a/codai/admin/templates/tasks.html
+++ b/codai/admin/templates/tasks.html
@@ -244,9 +244,9 @@ async function restartEngine(id, name){
  if (!confirm(`Restart engine "${name}"? In-flight requests on it will fail; the supervisor respawns it immediately.`)) return;
  try {
    const r = await fetch(ROOT_PATH + '/admin/api/engines/' + id + '/restart', {method:'POST'});
-    if (!r.ok) { const e = await r.json().catch(()=>({})); alert(e.detail || 'Restart failed'); }
+    if (!r.ok) { const e = await r.json().catch(()=>({})); showAlert(e.detail || 'Restart failed'); }
    setTimeout(loadEngines, 800);
-  } catch(e) { alert(e.message); }
+  } catch(e) { showAlert(e.message); }
 }
 let _refreshing = false;
@@ -338,9 +338,9 @@ async function taskAction(id, action) {
    const r = await fetch(ROOT_PATH + '/admin/api/tasks/' + encodeURIComponent(id) + '/' + action, {method:'POST'});
    if (!r.ok) {
      const e = await r.json().catch(() => ({}));
-      alert(e.detail || (verb + ' failed'));
+      showAlert(e.detail || (verb + ' failed'));
    }
-  } catch (e) { alert(e.message); }
+  } catch (e) { showAlert(e.message); }
  loadTasks();
 }
@@ -349,9 +349,9 @@ async function removeTask(id) {
    const r = await fetch(ROOT_PATH + '/admin/api/tasks/' + encodeURIComponent(id), {method:'DELETE'});
    if (!r.ok) {
      const e = await r.json().catch(() => ({}));
-      alert(e.detail || 'Remove failed');
+      showAlert(e.detail || 'Remove failed');
    }
-  } catch (e) { alert(e.message); }
+  } catch (e) { showAlert(e.message); }
  loadTasks();
 }

--- a/codai/admin/templates/tokens.html
+++ b/codai/admin/templates/tokens.html
@@ -126,15 +126,15 @@ async function createToken() {
      openModal('show-modal');
      loadTokens();
    } else {
-      const e = await r.json(); alert(e.detail || 'Failed');
+      const e = await r.json(); showAlert(e.detail || 'Failed');
    }
-  } catch (e) { alert(e.message); }
+  } catch (e) { showAlert(e.message); }
 }
 async function delToken(id) {
  if (!confirm('Delete this token? Clients using it will lose access immediately.')) return;
  const r = await fetch(ROOT_PATH + '/admin/api/tokens/'+id, {method:'DELETE'});
-  if (r.ok) loadTokens(); else alert('Failed to delete');
+  if (r.ok) loadTokens(); else showAlert('Failed to delete');
 }
 loadTokens();

--- a/codai/admin/templates/users.html
+++ b/codai/admin/templates/users.html
@@ -105,7 +105,7 @@ async function delUser(id, name) {
  if (!confirm('Delete user "' + name + '"?')) return;
  const r = await fetch(ROOT_PATH + '/admin/api/users/'+id, {method:'DELETE'});
  if (r.ok) location.reload();
-  else { const e = await r.json(); alert(e.detail || 'Failed'); }
+  else { const e = await r.json(); showAlert(e.detail || 'Failed'); }
 }
 </script>
 {% endblock %}
--- a/codai/api/text.py
+++ b/codai/api/text.py
@@ -243,6 +243,33 @@ def log_response_payload(payload, streamed=False):
 router = APIRouter()
+def _normalize_vision_content(content: list) -> list:
+    """Normalize an OpenAI multipart message content list to the shape the
+    llama.cpp multimodal (mmproj) handler expects: text parts as
+    ``{"type":"text","text":...}`` and images as
+    ``{"type":"image_url","image_url":{"url": ...}}``. The url may be an http(s)
+    link or a ``data:image/...;base64,...`` URI — both are accepted. Unknown
+    parts are dropped to a text placeholder so nothing crashes the handler."""
+    norm = []
+    for item in content:
+        if not isinstance(item, dict):
+            norm.append({"type": "text", "text": str(item)})
+            continue
+        t = item.get("type")
+        if t == "text" and "text" in item:
+            norm.append({"type": "text", "text": item["text"]})
+        elif t in ("image_url", "input_image"):
+            iu = item.get("image_url") if t == "image_url" else item.get("image")
+            url = iu.get("url") if isinstance(iu, dict) else iu
+            if url:
+                norm.append({"type": "image_url", "image_url": {"url": url}})
+        elif "text" in item:
+            norm.append({"type": "text", "text": str(item["text"])})
+        else:
+            norm.append({"type": "text", "text": f"[{t or 'unknown'} content]"})
+    return norm
 @router.post("/v1/chat/completions", summary="Chat completions")
 async def chat_completions(request: ChatCompletionRequest, http_request: Request = None):
    """Chat completions endpoint with streaming and tool support."""
@@ -519,6 +546,12 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                                   "Another model may be using all available VRAM.")
    current_manager = mm
+    # Does the resolved (loaded) model accept images? True only when an mmproj
+    # projector was loaded into the llama.cpp backend (see VulkanBackend). When
+    # set, multipart image content is preserved end-to-end instead of being
+    # flattened to a text placeholder, so the multimodal handler can see it.
+    _vision_ok = bool(getattr(getattr(current_manager, 'backend', None), 'supports_vision', False))
    # Inject system prompt if --system-prompt flag was provided
    messages = request.messages
    global_system_prompt = get_global_system_prompt()
@@ -733,6 +766,14 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
        if content is None:
            content = ""
        elif isinstance(content, list):
+            _has_image = _vision_ok and any(
+                isinstance(it, dict) and it.get('type') in ('image_url', 'input_image')
+                for it in content)
+            if _has_image:
+                # Vision (mmproj) model: keep OpenAI multipart content so the
+                # llama.cpp multimodal handler receives the images themselves.
+                content = _normalize_vision_content(content)
+            else:
                # Handle multipart content array format: [{"type": "text", "text": "..."}]
                parts = []
                for item in content:
@@ -744,7 +785,11 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
                    else:
                        parts.append(str(item))
                content = '\n'.join(parts)
-        # Ensure content is never None - convert to string
+        # Ensure content is never None - convert to string (but keep multipart
+        # vision content as a list so the multimodal handler can consume it).
+        if isinstance(content, list):
+            msg_dict["content"] = content
+        else:
            msg_dict["content"] = str(content) if content is not None else ""
        # Handle tool_calls - convert to proper format if present
        if msg.tool_calls:
@@ -765,8 +810,9 @@ async def chat_completions(request: ChatCompletionRequest, http_request: Request
        # Handle None content
        elif m.get("content") is None:
            messages_dict[i]["content"] = ""
-        # Handle content that's not a string (shouldn't happen but be safe)
+        # Handle content that's not a string (shouldn't happen but be safe).
-        elif not isinstance(m["content"], str):
+        # A list is legitimate multipart vision content — leave it intact.
+        elif not isinstance(m["content"], str) and not isinstance(m["content"], list):
            messages_dict[i]["content"] = str(m["content"])

--- a/codai/backends/vulkan.py
+++ b/codai/backends/vulkan.py
@@ -20,6 +20,7 @@
 import os
 import json
 import threading
+import time
 from typing import AsyncIterator, Optional, Union, List, Dict, Any
 from pathlib import Path
@@ -116,6 +117,74 @@ _KV_TYPE_ALIASES = {
 # Sub-8-bit KV types that llama.cpp can only use with flash attention enabled.
 _KV_NEEDS_FLASH = {'q5_0', 'q5_1', 'q5', 'q4_0', 'q4_1', 'q4', 'iq4_nl'}
+_GGUF_META_CACHE: dict = {}
+def _gguf_block_count(path) -> int:
+    """Layer (block) count from a GGUF header (``*.block_count``), 0 if unknown.
+    Reads only the metadata KV section (no tensors). Cached per path."""
+    if not path:
+        return 0
+    if path in _GGUF_META_CACHE:
+        return _GGUF_META_CACHE[path]
+    import struct
+    result = 0
+    try:
+        with open(path, 'rb') as f:
+            if f.read(4) != b'GGUF':
+                _GGUF_META_CACHE[path] = 0
+                return 0
+            struct.unpack('<I', f.read(4))                  # version
+            struct.unpack('<Q', f.read(8))                  # tensor count
+            n_kv = struct.unpack('<Q', f.read(8))[0]
+            def rd_str():
+                ln = struct.unpack('<Q', f.read(8))[0]
+                return f.read(ln).decode('utf-8', 'replace')
+            def rd_val(vt):
+                if vt == 0:  return struct.unpack('<B', f.read(1))[0]
+                if vt == 1:  return struct.unpack('<b', f.read(1))[0]
+                if vt == 2:  return struct.unpack('<H', f.read(2))[0]
+                if vt == 3:  return struct.unpack('<h', f.read(2))[0]
+                if vt == 4:  return struct.unpack('<I', f.read(4))[0]
+                if vt == 5:  return struct.unpack('<i', f.read(4))[0]
+                if vt == 6:  return struct.unpack('<f', f.read(4))[0]
+                if vt == 7:  return struct.unpack('<?', f.read(1))[0]
+                if vt == 8:  return rd_str()
+                if vt == 10: return struct.unpack('<Q', f.read(8))[0]
+                if vt == 11: return struct.unpack('<q', f.read(8))[0]
+                if vt == 12: return struct.unpack('<d', f.read(8))[0]
+                if vt == 9:
+                    et = struct.unpack('<I', f.read(4))[0]
+                    cnt = struct.unpack('<Q', f.read(8))[0]
+                    return [rd_val(et) for _ in range(cnt)]
+                raise ValueError(f"unknown gguf value type {vt}")
+            for _ in range(n_kv):
+                key = rd_str()
+                val = rd_val(struct.unpack('<I', f.read(4))[0])
+                if key.endswith('.block_count'):
+                    try:
+                        result = int(val)
+                    except (TypeError, ValueError):
+                        result = 0
+                    break
+    except Exception:
+        result = 0
+    _GGUF_META_CACHE[path] = result
+    return result
+def _free_vram_gb(device: int = 0) -> float:
+    """Free VRAM (GB) on the given CUDA device, 0.0 if unavailable."""
+    try:
+        import torch
+        free, _total = torch.cuda.mem_get_info(device)
+        return free / (1024 ** 3)
+    except Exception:
+        return 0.0
 def _ggml_kv_type(name):
    """Map a KV-cache quant name to the llama.cpp GGML type int, or None.
@@ -231,6 +300,7 @@ class VulkanBackend(ModelBackend):
        self.main_gpu = 0  # Default to first GPU
        self.chat_template = None  # Detected chat template name
        self.hf_tokenizer = None  # HuggingFace tokenizer for apply_chat_template
+        self.supports_vision = False  # set True when an mmproj projector is loaded
        self.force_cuda = original_backend in ("nvidia", "cuda")  # Force CUDA if original was nvidia
        if self.force_cuda:
            print("DEBUG: GGUF model will use CUDA backend (forced by --backend nvidia)")
@@ -713,6 +783,33 @@ class VulkanBackend(ModelBackend):
            self.n_gpu_layers = -1
        elif n_gpu_layers != -1:
            self.n_gpu_layers = n_gpu_layers
+        else:
+            # Auto (n_gpu_layers == -1): if the whole model won't fit in free VRAM,
+            # place as many layers on GPU as fit and leave the rest on CPU instead
+            # of trying to load everything and OOMing ("failed to create
+            # llama_context"). llama.cpp has no auto-fit, so we size it ourselves.
+            try:
+                _exp = kwargs.get('expected_vram_gb')
+                _nlayers = _gguf_block_count(model_path)
+                _free = _free_vram_gb(self.main_gpu if isinstance(self.main_gpu, int) else 0)
+                if _exp and _exp > 0 and _nlayers and _free > 0 and _exp > _free * 0.95:
+                    # Scale layers on GPU by the VRAM ratio (weights + KV roughly
+                    # scale per-layer). The estimate tends to undercount the KV
+                    # cache at large n_ctx, and a few GB of compute/output buffers
+                    # stay on GPU regardless — so reserve a context-scaled headroom
+                    # and inflate the need, to err toward fitting (CPU layers are
+                    # slow but a failed load is worse).
+                    _headroom = 2.0 + (self.n_ctx or 0) / 12000.0   # ~2 GB + ~1 GB per 12k ctx
+                    _usable = max(0.0, _free - _headroom)
+                    _fit = int(_nlayers * _usable / (_exp * 1.20))
+                    _fit = max(0, min(_nlayers - 1, _fit))
+                    self.n_gpu_layers = _fit
+                    print(f"  Auto-offload: model needs ~{_exp:.1f} GB but only "
+                          f"{_free:.1f} GB free — placing {_fit}/{_nlayers} layers on "
+                          f"GPU, {_nlayers - _fit} on CPU (slower). Lower n_ctx or use a "
+                          f"smaller model to keep it fully on GPU.", flush=True)
+            except Exception as _off_e:
+                print(f"  (auto-offload sizing skipped: {_off_e})", flush=True)
        # Configure context size
        if no_ram:
@@ -783,6 +880,35 @@ class VulkanBackend(ModelBackend):
            print(f"  KV cache: type_k={_ck or 'f16'}  type_v={_cv or 'f16'}"
                  f"{'  (flash_attn on)' if _flash else ''}")
+        # Multimodal projector (mmproj): pairs a CLIP/vision projector GGUF with
+        # this text model so it can accept images — the llama.cpp `--mmproj`
+        # equivalent, which adds vision capability (e.g. gemma). Uses llama.cpp's
+        # unified mtmd handler, which auto-detects the projector type from the file.
+        self.supports_vision = False
+        _mmproj = kwargs.get('mmproj', _raw_cfg.get('mmproj'))
+        if _mmproj:
+            _mmproj_path = os.path.expanduser(str(_mmproj))
+            if not os.path.isfile(_mmproj_path):
+                # Bare filename / moved cache → look beside the model file.
+                _cand = os.path.join(os.path.dirname(model_path),
+                                     os.path.basename(_mmproj_path))
+                if os.path.isfile(_cand):
+                    _mmproj_path = _cand
+            if os.path.isfile(_mmproj_path):
+                try:
+                    from llama_cpp.llama_chat_format import MTMDChatHandler
+                    llama_kwargs['chat_handler'] = MTMDChatHandler(
+                        clip_model_path=_mmproj_path,
+                        verbose=False,
+                        use_gpu=(self.n_gpu_layers != 0),
+                    )
+                    self.supports_vision = True
+                    print(f"  mmproj       : {os.path.basename(_mmproj_path)} (vision enabled)")
+                except Exception as _e:
+                    print(f"  mmproj       : failed to load projector ({_e}); continuing text-only")
+            else:
+                print(f"  mmproj       : configured path not found ({_mmproj}); skipping")
        # Force CUDA if requested
        if self.force_cuda:
            # Set environment variable to force CUDA
@@ -797,12 +923,44 @@ class VulkanBackend(ModelBackend):
            print(f"  GPU offload  : {'supported' if gpu_supported else 'NOT supported by this build'}")
        _log_cb = _install_layer_log_callback()
+        # Progress feedback during the (otherwise silent) tensor load. llama.cpp's
+        # progress_callback isn't exposed by the Llama wrapper, so inject it by
+        # patching the default model-params factory for the duration of construction.
+        # Kept alive in a local for the whole load (avoids a ctypes use-after-free).
+        _prog = {'last': -5, 't0': time.time()}
+        @_llama_cpp.llama_progress_callback
+        def _progress_cb(progress, user_data):
+            try:
+                pct = int(progress * 100)
+                if pct >= _prog['last'] + 5 or pct >= 100:
+                    _prog['last'] = pct
+                    print(f"  Loading model into VRAM/RAM: {pct}%"
+                          f" ({time.time() - _prog['t0']:.0f}s)", flush=True)
+            except Exception:
+                pass
+            return True
+        # Patch the SUBMODULE attribute — llama.py does `import llama_cpp.llama_cpp
+        # as llama_cpp` and builds model_params from it, so the top-level package
+        # attribute is not what it looks up.
+        _params_mod = getattr(_llama_cpp, 'llama_cpp', _llama_cpp)
+        _orig_params = _params_mod.llama_model_default_params
+        def _params_with_progress():
+            p = _orig_params()
+            try:
+                p.progress_callback = _progress_cb
+            except Exception:
+                pass
+            return p
+        _params_mod.llama_model_default_params = _params_with_progress
        try:
            self.model = Llama(**llama_kwargs)
        except Exception as e:
            print(f"Error loading GGUF model: {e}")
            raise
        finally:
+            _params_mod.llama_model_default_params = _orig_params
            # Quiet logging after load — but DO NOT drop to NULL + GC the callback.
            # ggml keeps the log-callback pointer and may still invoke it during
            # generation (e.g. gemma's iSWA hybrid cache logs every step), so a

--- a/codai/frontproxy/app.py
+++ b/codai/frontproxy/app.py
@@ -81,6 +81,8 @@ class FrontProxy:
        requests are dispatched to the right engine through the same router/proxy."""
        cfg = getattr(self.config, "broker", None)
        if cfg is None or not getattr(cfg, "enabled", False):
+            print("[front] AISBF broker not started (broker.enabled is false in config)",
+                  flush=True)
            return
        try:
            from codai.broker import build_broker_runtime_config, BrokerConfigError
@@ -142,9 +144,23 @@ class FrontProxy:
        return ("ok", {"object": "list", "data": [seen[i] for i in order]})
    async def broker_execute(self, *, method, path, headers, query, body):
+        _clean_path = path.split("?", 1)[0].rstrip("/")
+        # Brokered capabilities must describe the WHOLE node. Routing this to a
+        # single engine would report only that engine's CUDA-visible card (its
+        # torch hardware summary), so a multi-GPU node looks like it has one card.
+        # Build it here in the (torch-free) front, which enumerates every physical
+        # GPU via nvidia-smi + sysfs.
+        if method.upper() == "GET" and _clean_path == "/coderai/capabilities":
+            from codai.broker.capabilities import (
+                build_capabilities_document, build_hardware_summary)
+            import json as _json
+            doc = build_capabilities_document(hardware=build_hardware_summary())
+            return {"status_code": 200,
+                    "headers": {"content-type": "application/json"},
+                    "body": _json.dumps(doc).encode()}
        # Brokered models.list must reflect the WHOLE node (union across engines),
        # not a single engine's assigned subset.
-        if method.upper() == "GET" and path.split("?", 1)[0].rstrip("/") == "/v1/models":
+        if method.upper() == "GET" and _clean_path == "/v1/models":
            hdrs = {k: v for k, v in (headers or {}).items() if k.lower() not in _DROP_REQ}
            kind, val = await self.collect_models(hdrs)
            if kind == "ok":

--- a/codai/main.py
+++ b/codai/main.py
@@ -709,7 +709,7 @@ def main():
            # Also restrict /v1/models (list_models) to the assigned subset, so the
            # per-engine model list matches what it actually serves — config_mgr's
            # full models_data is untouched (the admin model list stays complete).
-            multi_model_manager.set_assigned_models(keep)
+            multi_model_manager.set_assigned_models(_keep)
        except Exception as _e:
            print(f"[engine] assignment filter failed ({_e}); registering all models")

--- a/codai/models/manager.py
+++ b/codai/models/manager.py
@@ -943,7 +943,7 @@ class MultiModelManager:
                # KV-cache quantization (llama.cpp type_k/type_v) — pass through
                # to the backend, with the raw models.json entry as a fallback.
                _raw = config.get('_raw_cfg') if isinstance(config.get('_raw_cfg'), dict) else {}
-                for _kvk in ('cache_type_k', 'cache_type_v'):
+                for _kvk in ('cache_type_k', 'cache_type_v', 'mmproj'):
                    _kvv = config.get(_kvk)
                    if _kvv is None:
                        _kvv = _raw.get(_kvk)
@@ -1062,7 +1062,7 @@ class MultiModelManager:
                # KV-cache quantization (llama.cpp type_k/type_v) — pass through
                # to the backend, with the raw models.json entry as a fallback.
                _raw = config.get('_raw_cfg') if isinstance(config.get('_raw_cfg'), dict) else {}
-                for _kvk in ('cache_type_k', 'cache_type_v'):
+                for _kvk in ('cache_type_k', 'cache_type_v', 'mmproj'):
                    _kvv = config.get(_kvk)
                    if _kvv is None:
                        _kvv = _raw.get(_kvk)

--- a/codai/models/parser.py
+++ b/codai/models/parser.py
@@ -1046,6 +1046,14 @@ def parse_gemma_native_tool_calls(text: str, tool_names=None):
        if tool_names and name not in tool_names:
            continue
        brace = m.end() - 1   # index of '{'
+        # Some models double-wrap the args: call:NAME{{"k":"v"}}. Skip the
+        # redundant outer brace so the real object is parsed instead of being
+        # mangled into a single key like '{"k"'.
+        j = brace + 1
+        while j < len(text) and text[j] in ' \t\r\n':
+            j += 1
+        if j < len(text) and text[j] == '{':
+            brace = j
        try:
            args, _ = _parse_gemma_loose_object(text, brace)
        except Exception:

--- a/codai/models/tmp_janitor.py
+++ b/codai/models/tmp_janitor.py
+"""Periodic cleanup of the temporary-working directory.
+Several pipelines write scratch files with ``tempfile.NamedTemporaryFile(delete=
+False)`` / ``mkdtemp()`` (frame extraction, upscaling, interpolation, dubbing,
+voice cloning…). When a generation is interrupted those temp entries are never
+removed, so a dedicated ``tmp_dir`` slowly fills up (it had grown to tens of GB).
+This background janitor age-prunes that directory: every
+``interval_minutes`` it deletes top-level entries whose most-recent mtime is older
+than ``max_age_hours``. Age-based pruning means in-flight work (touched recently)
+is left alone while abandoned scratch is reclaimed.
+Safety: it only ever operates on the *configured* ``tmp_dir`` (a dedicated path).
+It refuses to run against a bare system temp dir (/tmp, /var/tmp, …) so it can
+never delete other processes' files. Mirrors ``codai.models.ram_monitor`` in
+shape: module-level state + ``get_status()``, started once from ``codai.main``.
+"""
+import os
+import shutil
+import threading
+import time
+import logging
+from typing import Optional, Dict, Any
+_log = logging.getLogger(__name__)
+# Paths we must never treat as a prunable dedicated tmp dir.
+_FORBIDDEN = {"/", "/tmp", "/var/tmp", "/usr/tmp", "/dev/shm"}
+_state_lock = threading.Lock()
+_state: Dict[str, Any] = {
+    "enabled": False,
+    "tmp_dir": None,
+    "max_age_hours": None,
+    "interval_minutes": None,
+    "last_run_ts": 0.0,
+    "last_removed": 0,
+    "total_removed": 0,
+    "last_freed_bytes": 0,
+    "runs": 0,
+}
+_thread: Optional[threading.Thread] = None
+_started = False
+def get_status() -> Dict[str, Any]:
+    """Snapshot for the admin status endpoint / dashboard."""
+    with _state_lock:
+        return dict(_state)
+def _entry_newest_mtime(path: str) -> float:
+    """Most-recent mtime under ``path`` (the entry itself, or the newest file in a
+    directory tree). Using the newest mtime avoids deleting a directory whose top
+    folder is old but which still has freshly written files inside."""
+    try:
+        newest = os.lstat(path).st_mtime
+    except OSError:
+        return 0.0
+    if os.path.isdir(path) and not os.path.islink(path):
+        for root, _dirs, files in os.walk(path):
+            for name in files:
+                try:
+                    m = os.lstat(os.path.join(root, name)).st_mtime
+                    if m > newest:
+                        newest = m
+                except OSError:
+                    continue
+    return newest
+def _dir_size(path: str) -> int:
+    total = 0
+    if os.path.isdir(path) and not os.path.islink(path):
+        for root, _dirs, files in os.walk(path):
+            for name in files:
+                try:
+                    total += os.lstat(os.path.join(root, name)).st_size
+                except OSError:
+                    continue
+    else:
+        try:
+            total = os.lstat(path).st_size
+        except OSError:
+            total = 0
+    return total
+def _sweep(tmp_dir: str, max_age_seconds: float) -> tuple[int, int]:
+    """Remove top-level entries older than the cutoff. Returns (removed, freed)."""
+    now = time.time()
+    removed = 0
+    freed = 0
+    try:
+        entries = os.listdir(tmp_dir)
+    except OSError as e:
+        _log.debug("tmp janitor: cannot list %s: %s", tmp_dir, e)
+        return (0, 0)
+    for name in entries:
+        path = os.path.join(tmp_dir, name)
+        try:
+            if now - _entry_newest_mtime(path) < max_age_seconds:
+                continue
+            size = _dir_size(path)
+            if os.path.isdir(path) and not os.path.islink(path):
+                shutil.rmtree(path, ignore_errors=True)
+            else:
+                os.remove(path)
+            removed += 1
+            freed += size
+        except OSError as e:
+            _log.debug("tmp janitor: could not remove %s: %s", path, e)
+    return (removed, freed)
+def _run(tmp_dir: str, max_age_hours: float, interval_minutes: float) -> None:
+    max_age_seconds = max(0.0, max_age_hours) * 3600.0
+    interval = max(60.0, interval_minutes * 60.0)
+    while True:
+        try:
+            removed, freed = _sweep(tmp_dir, max_age_seconds)
+            with _state_lock:
+                _state["last_run_ts"] = time.time()
+                _state["last_removed"] = removed
+                _state["total_removed"] += removed
+                _state["last_freed_bytes"] = freed
+                _state["runs"] += 1
+            if removed:
+                _log.info("tmp janitor: removed %d stale entr%s (%.1f MB) from %s",
+                          removed, "y" if removed == 1 else "ies",
+                          freed / (1024 * 1024), tmp_dir)
+        except Exception as e:  # never let the janitor die
+            _log.warning("tmp janitor sweep failed: %s", e)
+        time.sleep(interval)
+def start(tmp_dir: Optional[str], enabled: bool = True,
+          max_age_hours: float = 24.0, interval_minutes: float = 60.0) -> bool:
+    """Start the janitor for ``tmp_dir``. No-op (returns False) when disabled, when
+    no dedicated tmp_dir is configured, or when tmp_dir is a shared system dir."""
+    global _thread, _started
+    if _started:
+        return True
+    if not enabled or not tmp_dir:
+        return False
+    real = os.path.abspath(os.path.expanduser(tmp_dir)).rstrip("/") or "/"
+    if real in _FORBIDDEN:
+        _log.info("tmp janitor: refusing to prune shared temp dir %s (set a dedicated tmp_dir)", real)
+        return False
+    if not os.path.isdir(real):
+        try:
+            os.makedirs(real, exist_ok=True)
+        except OSError:
+            return False
+    with _state_lock:
+        _state.update({
+            "enabled": True, "tmp_dir": real,
+            "max_age_hours": max_age_hours, "interval_minutes": interval_minutes,
+        })
+    _thread = threading.Thread(target=_run, args=(real, max_age_hours, interval_minutes),
+                               name="tmp-janitor", daemon=True)
+    _thread.start()
+    _started = True
+    _log.info("tmp janitor: pruning %s every %.0f min (entries older than %.1f h)",
+              real, interval_minutes, max_age_hours)
+    return True
+def sweep_once(tmp_dir: str, max_age_hours: float = 24.0) -> tuple[int, int]:
+    """Run a single prune pass and return (removed, freed_bytes). For cron use."""
+    real = os.path.abspath(os.path.expanduser(tmp_dir)).rstrip("/") or "/"
+    if real in _FORBIDDEN or not os.path.isdir(real):
+        raise SystemExit(f"refusing to prune {real!r} (not a dedicated tmp dir)")
+    return _sweep(real, max(0.0, max_age_hours) * 3600.0)
+if __name__ == "__main__":
+    # One-shot CLI for cron/systemd-timer use, e.g.:
+    #   */30 * * * * /path/venv/bin/python -m codai.models.tmp_janitor \
+    #                  --tmp /storage/coderai/tmp --max-age-hours 24
+    import argparse
+    p = argparse.ArgumentParser(description="Prune a dedicated CoderAI temp dir.")
+    p.add_argument("--tmp", required=True, help="the dedicated tmp_dir to prune")
+    p.add_argument("--max-age-hours", type=float, default=24.0,
+                   help="delete entries whose newest file is older than this")
+    a = p.parse_args()
+    n, b = sweep_once(a.tmp, a.max_age_hours)
+    print(f"tmp janitor: removed {n} entr{'y' if n == 1 else 'ies'} "
+          f"({b / (1024 * 1024):.1f} MB) from {a.tmp}")
--- a/packaging/linux/Dockerfile.oci-venv
+++ b/packaging/linux/Dockerfile.oci-venv
@@ -127,9 +127,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 # The fully assembled CoderAI tree (Python + venvs + tools), copied once.
 COPY --from=assembler /opt/coderai /opt/coderai
-# Now the standalone interpreter exists, activate it for the app + launchers.
+# Put the standalone interpreter first on PATH. Do NOT set PYTHONHOME globally:
-ENV PYTHONHOME=/opt/coderai/python \
+# supervisord runs on the system python3 (3.12) and a PYTHONHOME pointing at the
-    PATH=/opt/coderai/python/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+# standalone 3.13 stdlib breaks it ("No module named 'encodings'"). The standalone
+# python is relocatable, and the per-service launchers set PYTHONHOME themselves.
+ENV PATH=/opt/coderai/python/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 WORKDIR /opt/coderai/app
 COPY . /opt/coderai/app

--- a/packaging/linux/Dockerfile.update
+++ b/packaging/linux/Dockerfile.update
+# Incremental update of an already-built coderai image.
+#
+# Re-layers ONLY the application code, launcher scripts and service configs on
+# top of an existing base image (the heavy bundle: python, venvs, native libs,
+# lip-sync, ds4, parler). Those base layers are inherited unchanged — there is no
+# 20 GB bundle recopy — so this builds in seconds even with an empty build cache.
+#
+# Driven by packaging/linux/update_oci_image.sh, which keeps an immutable
+# `coderai:base` tag so repeated updates always start from the same bundle and
+# never stack app layers on top of each other.
+ARG BASE_IMAGE=coderai:base
+FROM ${BASE_IMAGE}
+# Refresh the app tree plus the scripts/configs that live outside it. The big
+# /opt/coderai/{python,*-venv,local-libs,Wav2Lip,SadTalker,ds4,py310} trees are
+# left as inherited layers. (COPY overwrites/adds; a file deleted from the repo
+# is pruned by the cleanup RUN below for the known-stale paths.)
+COPY . /opt/coderai/app
+COPY packaging/linux/launcher/coderai-oci /usr/local/bin/coderai
+COPY packaging/linux/launcher/with-env /usr/local/bin/with-env
+COPY packaging/linux/launcher/coderai-entrypoint /usr/local/bin/coderai-entrypoint
+COPY packaging/linux/launcher/wav2lip /usr/local/bin/wav2lip
+COPY packaging/linux/launcher/sadtalker /usr/local/bin/sadtalker
+COPY packaging/linux/nginx.conf /etc/nginx/nginx.conf
+COPY packaging/linux/supervisord.conf /etc/supervisor/supervisord.conf
+COPY packaging/linux/README-RUN.txt /opt/coderai/README-RUN.txt
+RUN set -eux; \
+    chmod +x /usr/local/bin/coderai /usr/local/bin/with-env /usr/local/bin/coderai-entrypoint \
+             /usr/local/bin/wav2lip /usr/local/bin/sadtalker /opt/coderai/app/coderai; \
+    mkdir -p /config /models /cache /opt/coderai/app/models; \
+    rm -rf \
+      /opt/coderai/app/.git \
+      /opt/coderai/app/venv* \
+      /opt/coderai/app/.venv \
+      /opt/coderai/app/township_output \
+      /opt/coderai/app/offload \
+      /opt/coderai/app/dist \
+      /opt/coderai/app/.packaging-cache; \
+    find /opt/coderai/app -type d -name __pycache__ -prune -exec rm -rf '{}' +; \
+    /opt/coderai/python/bin/python3 -c "import importlib.util, sys; m=[n for n in ('fastapi','uvicorn','torch') if importlib.util.find_spec(n) is None]; sys.exit('base image missing: '+', '.join(m) if m else 0)"
+# ENTRYPOINT / EXPOSE / VOLUME / ENV / WORKDIR are inherited from the base image.
--- a/packaging/linux/build_oci_image.sh
+++ b/packaging/linux/build_oci_image.sh
@@ -263,7 +263,10 @@ prepare_venv_bundle() {
    if [[ -e "$dest_path" && "$bin_path" -ef "$dest_path" ]]; then
      continue
    fi
-    cp -a --remove-destination "$bin_path" "$dest_path"
+    # -L: dereference symlinks so the REAL binary is bundled. /usr/local/bin
+    # entries are often symlinks to a build dir (e.g. ~/whisper.cpp/build/bin);
+    # copying the link verbatim leaves a dangling symlink in the image.
+    cp -aL --remove-destination "$bin_path" "$dest_path"
  done
  if [[ ${#LOCAL_BINARIES[@]} -gt 0 ]]; then

--- a/packaging/linux/launcher/coderai-entrypoint
+++ b/packaging/linux/launcher/coderai-entrypoint
+#!/usr/bin/env sh
+# Top-level entrypoint for the CoderAI distributable image.
+# Prepares shared state directories and hands off to supervisord, which runs
+# nginx + the main server + the bundled tool web UIs on the single published port.
+set -eu
+: "${CODERAI_CONFIG_DIR:=/config}"
+: "${CODERAI_MODELS_DIR:=/models}"
+: "${CODERAI_CACHE_DIR:=/cache}"
+# Default parler model id; referenced by supervisord even when the parler program
+# is disabled, so it must always be defined.
+: "${CODERAI_PARLER_MODEL:=parler-tts/parler-tts-mini-multilingual}"
+# Dedicated temp dir on the cache volume, shared by the server and the tool
+# processes (so scratch from upscaling/lip-sync/ffmpeg lands in one place). The
+# server's built-in janitor age-prunes it; see CODERAI_TMP below.
+: "${CODERAI_TMP:=$CODERAI_CACHE_DIR/coderai-tmp}"
+export TMPDIR="$CODERAI_TMP" TMP="$CODERAI_TMP" TEMP="$CODERAI_TMP"
+# Don't write .pyc into the read-only /opt/coderai tree (esp. when run as --user).
+export PYTHONDONTWRITEBYTECODE=1
+export CODERAI_CONFIG_DIR CODERAI_MODELS_DIR CODERAI_CACHE_DIR CODERAI_PARLER_MODEL CODERAI_TMP
+mkdir -p \
+  "$CODERAI_CONFIG_DIR/coderai" \
+  "$CODERAI_MODELS_DIR/coderai" \
+  "$CODERAI_CACHE_DIR/coderai" \
+  "$CODERAI_CACHE_DIR/township_output" \
+  "$CODERAI_CACHE_DIR/videogen_output" \
+  "$CODERAI_TMP" \
+  /tmp/nginx-client-body /tmp/nginx-proxy /tmp/nginx-fastcgi \
+  /tmp/nginx-uwsgi /tmp/nginx-scgi
+# Seed the ds4 working dir on the cache volume from the bundled binary + scripts
+# (DeepSeek-V4 weights download here at runtime, so it must be writable/persistent).
+if [ -d /opt/coderai/ds4 ] && [ ! -e "$CODERAI_CACHE_DIR/ds4/ds4-server" ]; then
+  mkdir -p "$CODERAI_CACHE_DIR/ds4"
+  cp -an /opt/coderai/ds4/. "$CODERAI_CACHE_DIR/ds4/" 2>/dev/null || true
+fi
+# If invoked with arguments, run them directly (debugging / one-off commands)
+# instead of the supervised stack.
+if [ "$#" -gt 0 ]; then
+  exec "$@"
+fi
+# supervisord runs on the system python3; a leaked PYTHONHOME (pointing at the
+# standalone 3.13) would break it. The per-service launchers set their own.
+unset PYTHONHOME
+exec /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
--- a/packaging/linux/launcher/coderai-oci
+++ b/packaging/linux/launcher/coderai-oci
@@ -72,8 +72,47 @@ if changed:
 PY
 fi
-# Point the server at the shared dedicated temp dir so its janitor prunes it.
+# Optional debug logging. CODERAI_DEBUG selects coderai's --debug* flags:
-if [ -n "${CODERAI_TMP:-}" ]; then
+#   all              -> every debug flag
-  exec /opt/coderai/python/bin/python3 /opt/coderai/app/coderai --config "$CONFIG_DIR" --tmp "$CODERAI_TMP" "$@"
+#   1|true|yes|on    -> just --debug
+#   "engine,ws,..."  -> --debug-engine --debug-ws ...  (bare names get --debug- prefixed;
+#                       full "--debug-foo" tokens are passed through; comma OR space separated)
+DEBUG_ARGS=""
+case "${CODERAI_DEBUG:-}" in
+  "") : ;;
+  all|ALL|All)
+    DEBUG_ARGS="--debug --debug-ws --debug-web --debug-thermal --debug-lora --debug-requests --debug-engine --debug-engine-web" ;;
+  1|true|TRUE|yes|YES|on|ON)
+    DEBUG_ARGS="--debug" ;;
+  *)
+    for _f in $(echo "$CODERAI_DEBUG" | tr ',' ' '); do
+      case "$_f" in
+        --*)  DEBUG_ARGS="$DEBUG_ARGS $_f" ;;
+        debug) DEBUG_ARGS="$DEBUG_ARGS --debug" ;;
+        *)    DEBUG_ARGS="$DEBUG_ARGS --debug-$_f" ;;
+      esac
+    done ;;
+esac
+# --debug-* flags need --debug present to take effect; add it if the user picked
+# only sub-flags.
+case " $DEBUG_ARGS " in *" --debug "*) : ;; *[!\ ]*) DEBUG_ARGS="--debug$DEBUG_ARGS" ;; esac
+# Assemble the server argv: --config, optional --tmp, debug flags, then passthrough.
+set -- --config "$CONFIG_DIR" "$@"
+[ -n "${CODERAI_TMP:-}" ] && set -- "$@" --tmp "$CODERAI_TMP"
+CODERAI_BIN="/opt/coderai/python/bin/python3 /opt/coderai/app/coderai"
+# Optional host-tailable file log. CODERAI_LOG_FILE should point under a mounted
+# volume (e.g. /cache/logs/coderai.log) so it's visible + tailable on the host.
+# We tee so output still reaches `docker logs` too. (supervisord runs this script
+# with killasgroup, so the coderai front + its engine subprocesses + tee are all
+# torn down together on stop.)
+if [ -n "${CODERAI_LOG_FILE:-}" ]; then
+  mkdir -p "$(dirname "$CODERAI_LOG_FILE")" 2>/dev/null || true
+  echo "[coderai-oci] debug='${CODERAI_DEBUG:-off}' → logging to $CODERAI_LOG_FILE" >&2
+  # shellcheck disable=SC2086
+  exec $CODERAI_BIN "$@" $DEBUG_ARGS 2>&1 | tee -a "$CODERAI_LOG_FILE"
 fi
-exec /opt/coderai/python/bin/python3 /opt/coderai/app/coderai --config "$CONFIG_DIR" "$@"
+# shellcheck disable=SC2086
+exec $CODERAI_BIN "$@" $DEBUG_ARGS
--- a/packaging/linux/launcher/sadtalker
+++ b/packaging/linux/launcher/sadtalker
+#!/usr/bin/env bash
+# CLI shim for SadTalker talking-head generation, run in the shared lip-sync venv.
+# codai/api/video.py invokes:
+#   sadtalker --driven_audio AUDIO --source_video VIDEO --result_dir DIR
+# SadTalker animates a still image, so a source video is reduced to its first frame.
+#
+# Checkpoints are NOT baked into the image: on first use they download into the
+# writable working dir (a /cache volume in the container) and persist there.
+set -euo pipefail
+VENV="${CODERAI_LIPSYNC_VENV:-$HOME/.coderai/lipsync_venv}"
+SRC="${CODERAI_SADTALKER_SRC:-$HOME/.coderai/SadTalker}"   # baked read-only repo code
+DIR="${CODERAI_SADTALKER_DIR:-$SRC}"                       # writable working copy
+if [ ! -x "$VENV/bin/python" ]; then
+  echo "sadtalker: lip-sync venv not found at $VENV" >&2
+  exit 127
+fi
+if [ ! -f "$DIR/inference.py" ]; then
+  mkdir -p "$DIR"
+  rsync -a --exclude 'checkpoints/*' --exclude 'gfpgan/weights/*' "$SRC/" "$DIR/"
+fi
+# Download checkpoints on first use (idempotent).
+mkdir -p "$DIR/checkpoints" "$DIR/gfpgan/weights"
+_dl(){ if [ ! -s "$2" ]; then echo "sadtalker: downloading $(basename "$2") …" >&2;
+  curl -fSL --retry 3 -o "$2" "$1" || { echo "sadtalker: download failed: $1" >&2; exit 1; }; fi; }
+_b="https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2-rc"
+_dl "$_b/mapping_00109-model.pth.tar"          "$DIR/checkpoints/mapping_00109-model.pth.tar"
+_dl "$_b/mapping_00229-model.pth.tar"          "$DIR/checkpoints/mapping_00229-model.pth.tar"
+_dl "$_b/SadTalker_V0.0.2_256.safetensors"     "$DIR/checkpoints/SadTalker_V0.0.2_256.safetensors"
+_dl "$_b/SadTalker_V0.0.2_512.safetensors"     "$DIR/checkpoints/SadTalker_V0.0.2_512.safetensors"
+_dl "https://github.com/xinntao/facexlib/releases/download/v0.1.0/alignment_WFLW_4HG.pth"      "$DIR/gfpgan/weights/alignment_WFLW_4HG.pth"
+_dl "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" "$DIR/gfpgan/weights/detection_Resnet50_Final.pth"
+_dl "https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth"              "$DIR/gfpgan/weights/GFPGANv1.4.pth"
+_dl "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth"        "$DIR/gfpgan/weights/parsing_parsenet.pth"
+driven=""; result=""; source_img=""; source_video=""
+extra=()
+while [ "$#" -gt 0 ]; do
+  case "$1" in
+    --driven_audio) driven="$2"; shift 2;;
+    --source_video) source_video="$2"; shift 2;;
+    --source_image) source_img="$2"; shift 2;;
+    --result_dir)   result="$2"; shift 2;;
+    *) extra+=("$1"); shift;;
+  esac
+done
+result="${result:-./results}"
+mkdir -p "$result"
+cleanup_img=""
+if [ -z "$source_img" ] && [ -n "$source_video" ]; then
+  source_img="$(mktemp --suffix=.png)"
+  cleanup_img="$source_img"
+  ffmpeg -y -i "$source_video" -frames:v 1 "$source_img" -loglevel error
+fi
+work="$(mktemp -d)"
+trap 'rm -rf "$work"' EXIT
+cd "$work"
+export PYTHONPATH="$DIR${PYTHONPATH:+:$PYTHONPATH}"
+set +e
+"$VENV/bin/python" "$DIR/inference.py" \
+  --driven_audio "$driven" \
+  --source_image "$source_img" \
+  --result_dir "$result" \
+  --checkpoint_dir "$DIR/checkpoints" \
+  ${extra[@]+"${extra[@]}"}
+rc=$?
+set -e
+[ -n "$cleanup_img" ] && rm -f "$cleanup_img" || true
+newest="$(find "$result" -type f -name '*.mp4' -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | cut -d' ' -f2-)"
+if [ -n "$newest" ] && [ "$(dirname "$newest")" != "$result" ]; then
+  cp -f "$newest" "$result/"
+fi
+exit $rc
--- a/packaging/linux/launcher/wav2lip
+++ b/packaging/linux/launcher/wav2lip
+#!/usr/bin/env bash
+# CLI shim for Wav2Lip lip-sync, run inside the shared lip-sync venv.
+# codai/api/video.py invokes:  wav2lip --face VIDEO --audio AUDIO --outfile OUT
+#
+# Checkpoints are NOT baked into the image: on first use they download into the
+# writable working dir (a /cache volume in the container) and persist there.
+set -euo pipefail
+VENV="${CODERAI_LIPSYNC_VENV:-$HOME/.coderai/lipsync_venv}"
+SRC="${CODERAI_WAV2LIP_SRC:-$HOME/.coderai/Wav2Lip}"   # baked read-only repo code
+DIR="${CODERAI_WAV2LIP_DIR:-$SRC}"                     # writable working copy
+if [ ! -x "$VENV/bin/python" ]; then
+  echo "wav2lip: lip-sync venv not found at $VENV" >&2
+  exit 127
+fi
+# Seed a writable copy of the repo code if the working dir isn't populated
+# (the image ships the code read-only under /opt; weights are excluded).
+if [ ! -f "$DIR/inference.py" ]; then
+  mkdir -p "$DIR"
+  rsync -a --exclude 'checkpoints/' --exclude 'face_detection/detection/sfd/*.pth' "$SRC/" "$DIR/"
+fi
+# Download checkpoints on first use (idempotent: skips non-empty files).
+mkdir -p "$DIR/checkpoints" "$DIR/face_detection/detection/sfd"
+_dl(){ if [ ! -s "$2" ]; then echo "wav2lip: downloading $(basename "$2") …" >&2;
+  curl -fSL --retry 3 -o "$2" "$1" || { echo "wav2lip: download failed: $1" >&2; exit 1; }; fi; }
+_dl "https://huggingface.co/camenduru/Wav2Lip/resolve/main/checkpoints/wav2lip_gan.pth" \
+    "$DIR/checkpoints/wav2lip_gan.pth"
+_dl "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" \
+    "$DIR/face_detection/detection/sfd/s3fd.pth"
+CKPT="${CODERAI_WAV2LIP_CKPT:-$DIR/checkpoints/wav2lip_gan.pth}"
+# Run from a writable scratch dir (inference.py writes ./temp/*), repo on PYTHONPATH.
+work="$(mktemp -d)"
+trap 'rm -rf "$work"' EXIT
+cd "$work"
+mkdir -p temp
+export PYTHONPATH="$DIR${PYTHONPATH:+:$PYTHONPATH}"
+"$VENV/bin/python" "$DIR/inference.py" --checkpoint_path "$CKPT" "$@"
--- a/packaging/linux/launcher/with-env
+++ b/packaging/linux/launcher/with-env
+#!/usr/bin/env sh
+# Set the CoderAI runtime environment (standalone Python, bundled native libs,
+# nvidia wheel libs) then exec the given command. Used by supervisord to launch
+# the bundled tool web UIs with the same library environment as the main server.
+set -eu
+export PYTHONHOME=/opt/coderai/python
+export PATH="/opt/coderai/python/bin:$PATH"
+NV="/opt/coderai/python/lib/python3.13/site-packages/nvidia"
+LIBS="/opt/coderai/python/lib:/opt/coderai/local-libs"
+if [ -d "$NV" ]; then
+  for d in "$NV"/*/lib; do
+    [ -d "$d" ] && LIBS="$LIBS:$d"
+  done
+fi
+export LD_LIBRARY_PATH="$LIBS${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
+exec "$@"
--- a/packaging/linux/nginx.conf
+++ b/packaging/linux/nginx.conf
+# CoderAI single-port reverse proxy (in-container).
+# Fronts the main server and the bundled tool web UIs on one published port.
+# nginx runs in the foreground under supervisord (daemon off).
+# No `user` directive: when the container runs as root, the master stays root and
+# spawns workers as nobody; when run with `--user UID`, nginx runs entirely as that
+# UID. All writable state below lives under /tmp so non-root runs work unchanged.
+worker_processes  auto;
+daemon off;
+pid /tmp/nginx.pid;
+error_log /dev/stderr info;
+events {
+    worker_connections  1024;
+}
+http {
+    include       /etc/nginx/mime.types;
+    default_type  application/octet-stream;
+    sendfile      on;
+    server_tokens off;
+    access_log /dev/stdout;
+    # Writable temp paths under /tmp so the listed user (root or --user UID) can
+    # always create them; the defaults under /var/lib/nginx are root-only.
+    client_body_temp_path /tmp/nginx-client-body;
+    proxy_temp_path       /tmp/nginx-proxy;
+    fastcgi_temp_path     /tmp/nginx-fastcgi;
+    uwsgi_temp_path       /tmp/nginx-uwsgi;
+    scgi_temp_path        /tmp/nginx-scgi;
+    # AI workloads: large uploads (images/audio/video) and long generations.
+    client_max_body_size 4096m;
+    proxy_read_timeout   3600s;
+    proxy_send_timeout   3600s;
+    proxy_connect_timeout 75s;
+    # Shared proxy headers. CoderAI builds public URLs from these
+    # (codai/api/urlutils.py); the tools honour X-Forwarded-Prefix for sub-paths.
+    proxy_http_version 1.1;
+    proxy_set_header Host              $host;
+    proxy_set_header X-Real-IP         $remote_addr;
+    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+    proxy_set_header X-Forwarded-Host  $host;
+    upstream coderai   { server 127.0.0.1:18776; }
+    upstream editor    { server 127.0.0.1:8420;  }
+    upstream videogen  { server 127.0.0.1:7790;  }
+    upstream township  { server 127.0.0.1:7788;  }
+    server {
+        listen 8776 default_server;
+        listen [::]:8776 default_server;
+        server_name _;
+        # --- Video editor: https://host:8776/editor/ -------------------------
+        location /editor/ {
+            proxy_pass http://editor/;            # trailing slash strips the prefix
+            proxy_set_header X-Forwarded-Prefix /editor;
+            proxy_request_buffering off;          # stream large uploads through
+            proxy_buffering off;                  # SSE progress
+        }
+        # --- Videogen studio: https://host:8776/videogen/ -------------------
+        location /videogen/ {
+            proxy_pass http://videogen/;
+            proxy_set_header X-Forwarded-Prefix /videogen;
+            proxy_request_buffering off;
+            proxy_buffering off;
+        }
+        # --- Township fighters: https://host:8776/township/ ----------------
+        location /township/ {
+            proxy_pass http://township/;
+            proxy_set_header X-Forwarded-Prefix /township;
+            proxy_request_buffering off;
+            proxy_buffering off;
+        }
+        # --- CoderAI server + OpenAI API at the root ------------------------
+        location / {
+            proxy_pass http://coderai;
+            proxy_buffering off;                  # SSE: chat stream + task progress
+        }
+    }
+}
--- a/packaging/linux/run_oci.sh
+++ b/packaging/linux/run_oci.sh
@@ -16,6 +16,17 @@ DATA_ROOT="$PWD/coderai-runtime"
 DETACH=0
 NAME="coderai"
 EXTRA_ARGS=()
+# Optional: map an EXISTING local config dir + real data dirs so the image runs
+# against your live config/models without a rebuild (an image is immutable; this
+# is purely run-time bind-mounts). See --config-dir / --local / --map below.
+CONFIG_DIR_SRC=""
+INPLACE_CONFIG=0
+MAPS=()
+# Optional debug logging: CODERAI_DEBUG selects coderai's --debug* flags inside
+# the container; LOG_FILE_CONT is the in-container log path (under a mounted
+# volume so it's tailable on the host).
+DEBUG_SPEC=""
+LOG_FILE_CONT=""
 usage() {
  cat <<'EOF'
@@ -32,8 +43,26 @@ Options:
  --data-dir PATH     Directory for config/models/cache (default: ./coderai-runtime).
  --name NAME         Container name (default: coderai).
  -d, --detach        Run in background.
+  --config-dir PATH   Use an EXISTING config dir (with config.json/models.json),
+                      mounted at /config/coderai. Copied to a temp dir by default
+                      so the image's host/port rewrite leaves your dir untouched.
+  --local             Shortcut for --config-dir ~/.coderai.
+  --inplace-config    Mount --config-dir in place (the image WILL edit host/port).
+  --map HOST[:CONT]   Bind-mount a host dir at the SAME path (or HOST:CONT) inside
+                      the container, so absolute paths in models.json resolve
+                      (e.g. --map /AI/guffcache). Repeatable.
+  --debug[=SPEC]      Run coderai with debug flags. SPEC (default 'all'):
+                        all | engine,requests,ws,web,thermal,lora,engine-web
+                      Also writes a host-tailable file log (see --log-file).
+  --log-file PATH     In-container log path (default /cache/logs/coderai.log,
+                      visible on the host under the cache mount). Implies a file
+                      log even without --debug. tee'd, so `docker logs` still works.
  -- ARGS             Extra args passed to the container engine before the image name.
  -h, --help          Show this help.
+Test against your live config + data (no rebuild):
+  packaging/linux/run_oci.sh --nvidia --local \
+    --map /AI/guffcache --map /AI/huggingface --map /AI/offloads
 EOF
 }
@@ -53,6 +82,19 @@ while [[ $# -gt 0 ]]; do
    --name)
      [[ $# -ge 2 ]] || { echo "Error: --name requires a value" >&2; exit 2; }
      NAME="$2"; shift 2 ;;
+    --config-dir)
+      [[ $# -ge 2 ]] || { echo "Error: --config-dir requires a path" >&2; exit 2; }
+      CONFIG_DIR_SRC="$2"; shift 2 ;;
+    --local) CONFIG_DIR_SRC="$HOME/.coderai"; shift ;;
+    --inplace-config) INPLACE_CONFIG=1; shift ;;
+    --map)
+      [[ $# -ge 2 ]] || { echo "Error: --map requires HOST[:CONT]" >&2; exit 2; }
+      MAPS+=("$2"); shift 2 ;;
+    --debug) DEBUG_SPEC="all"; shift ;;
+    --debug=*) DEBUG_SPEC="${1#*=}"; shift ;;
+    --log-file)
+      [[ $# -ge 2 ]] || { echo "Error: --log-file requires a path" >&2; exit 2; }
+      LOG_FILE_CONT="$2"; shift 2 ;;
    -d|--detach) DETACH=1; shift ;;
    --)
      shift
@@ -90,7 +132,61 @@ volume_suffix=""
 if [[ "$ENGINE" == "podman" ]]; then
  volume_suffix=":Z"
 fi
-args+=(-v "$DATA_ROOT/config:/config$volume_suffix" -v "$DATA_ROOT/models:/models$volume_suffix" -v "$DATA_ROOT/cache:/cache$volume_suffix")
+# Config mount: either the fresh scratch dir, or an EXISTING local config dir
+# mounted at /config/coderai (where the image launcher reads config.json).
+CONFIG_NOTE="$DATA_ROOT/config (fresh)"
+if [[ -n "$CONFIG_DIR_SRC" ]]; then
+  [[ -d "$CONFIG_DIR_SRC" ]] || { echo "Error: --config-dir '$CONFIG_DIR_SRC' not found" >&2; exit 2; }
+  CONFIG_DIR_SRC="$(cd "$CONFIG_DIR_SRC" && pwd)"
+  if [[ "$INPLACE_CONFIG" == "1" ]]; then
+    CFG_MOUNT="$CONFIG_DIR_SRC"
+    CONFIG_NOTE="$CONFIG_DIR_SRC (in place — image rewrites host/port!)"
+  else
+    # Copy ONLY the json config files to a throwaway dir so the image's host/port
+    # rewrite never touches your real config, and we don't copy big subdirs
+    # (e.g. ~/.coderai/ds4 weights).
+    CFG_PARENT="$(mktemp -d "${TMPDIR:-/tmp}/coderai-cfg.XXXXXX")"
+    CFG_MOUNT="$CFG_PARENT/coderai"
+    mkdir -p "$CFG_MOUNT"
+    cp -a "$CONFIG_DIR_SRC"/*.json "$CFG_MOUNT/" 2>/dev/null || true
+    [[ -f "$CFG_MOUNT/config.json" ]] || { echo "Error: no config.json in '$CONFIG_DIR_SRC'" >&2; exit 2; }
+    CONFIG_NOTE="$CONFIG_DIR_SRC → $CFG_MOUNT (copy; original untouched)"
+  fi
+  args+=(-v "$CFG_MOUNT:/config/coderai$volume_suffix" \
+         -v "$DATA_ROOT/models:/models$volume_suffix" -v "$DATA_ROOT/cache:/cache$volume_suffix")
+else
+  args+=(-v "$DATA_ROOT/config:/config$volume_suffix" -v "$DATA_ROOT/models:/models$volume_suffix" -v "$DATA_ROOT/cache:/cache$volume_suffix")
+fi
+# 1:1 (or HOST:CONT) data mounts so absolute paths in models.json resolve.
+for m in "${MAPS[@]:-}"; do
+  [[ -n "$m" ]] || continue
+  host="${m%%:*}"; cont="${m#*:}"; [[ "$m" == *:* ]] || cont="$host"
+  if [[ -d "$host" ]]; then
+    args+=(-v "$host:$cont$volume_suffix")
+  else
+    echo "Warning: --map source '$host' not found; skipping" >&2
+  fi
+done
+# Debug flags + host-tailable file log. A file log is enabled by --debug or
+# --log-file; default path lives under /cache so it lands on the host mount.
+LOG_HOST_NOTE="(none)"
+if [[ -n "$DEBUG_SPEC" || -n "$LOG_FILE_CONT" ]]; then
+  : "${LOG_FILE_CONT:=/cache/logs/coderai.log}"
+  [[ -n "$DEBUG_SPEC" ]] && args+=(-e "CODERAI_DEBUG=$DEBUG_SPEC")
+  args+=(-e "CODERAI_LOG_FILE=$LOG_FILE_CONT")
+  # Translate the in-container path to the host path for the banner, for the
+  # standard /config|/models|/cache mounts.
+  case "$LOG_FILE_CONT" in
+    /cache/*)  LOG_HOST_NOTE="$DATA_ROOT/cache/${LOG_FILE_CONT#/cache/}" ;;
+    /models/*) LOG_HOST_NOTE="$DATA_ROOT/models/${LOG_FILE_CONT#/models/}" ;;
+    /config/*) LOG_HOST_NOTE="$DATA_ROOT/config/${LOG_FILE_CONT#/config/}" ;;
+    *)         LOG_HOST_NOTE="$LOG_FILE_CONT (in-container; mount it to see it on the host)" ;;
+  esac
+fi
 args+=("${EXTRA_ARGS[@]}" "$IMAGE_TAG")
 cat <<EOF
@@ -100,6 +196,13 @@ Starting CoderAI OCI container
  mode:    $MODE
  url:     http://127.0.0.1:$PORT/admin
  data:    $DATA_ROOT
+  config:  $CONFIG_NOTE
+  debug:   ${DEBUG_SPEC:-off}
+  log:     $LOG_HOST_NOTE
 EOF
+if [[ "$LOG_HOST_NOTE" != "(none)" ]]; then
+  echo "  tail it:  tail -F '$LOG_HOST_NOTE'"
+fi
 exec "$ENGINE" "${args[@]}"
--- a/packaging/linux/smoke_test_services.sh
+++ b/packaging/linux/smoke_test_services.sh
+#!/usr/bin/env bash
+# Smoke test for the all-in-one CoderAI image: brings the container up and checks
+# that nginx + the bundled services answer, and that every external binary/worker
+# we rely on is present and runnable. Does NOT load models (no weights needed).
+#
+# Usage: [DOCKER="sudo docker"] [GPU=--gpus=all] ./smoke_test_services.sh [IMAGE]
+set -uo pipefail
+DOCKER_BIN="${DOCKER:-docker}"
+read -r -a DK <<< "$DOCKER_BIN"
+IMAGE="${1:-coderai:dist}"
+PORT="${PORT:-18080}"
+NAME="coderai-smoke-$$"
+GPU="${GPU:-}"
+TMP="$(mktemp -d)"
+fails=0
+note(){ printf '%-52s %s\n' "$1" "$2"; }
+ok(){ note "$1" "OK"; }
+bad(){ note "$1" "FAIL — $2"; fails=$((fails+1)); }
+cleanup(){ "${DK[@]}" rm -f "$NAME" >/dev/null 2>&1 || true; rm -rf "$TMP"; }
+trap cleanup EXIT
+echo "== starting $IMAGE as $NAME (port $PORT) =="
+mkdir -p "$TMP/config" "$TMP/models" "$TMP/cache"
+# shellcheck disable=SC2086
+"${DK[@]}" run -d --name "$NAME" $GPU --ipc=host \
+  --user "$(id -u):$(id -g)" \
+  -p "$PORT:8776" \
+  -v "$TMP/config:/config" -v "$TMP/models:/models" -v "$TMP/cache:/cache" \
+  "$IMAGE" >/dev/null || { echo "container failed to start"; exit 1; }
+echo "== waiting for the front to answer =="
+up=0
+for _ in $(seq 1 60); do
+  code="$(curl -s -o /dev/null -w '%{http_code}' "http://127.0.0.1:$PORT/" || true)"
+  # Any non-5xx HTTP code means the front + coderai are up (the root path itself
+  # 404s — the UI lives at /admin); 502/503 means the upstream isn't ready yet.
+  case "$code" in 200|301|302|307|401|403|404) up=1; break;; esac
+  if ! "${DK[@]}" ps -q --filter "name=$NAME" | grep -q .; then
+    echo "container exited early; logs:"; "${DK[@]}" logs "$NAME" 2>&1 | tail -40; exit 1
+  fi
+  sleep 3
+done
+[ "$up" = 1 ] && ok "front http://…:$PORT/ responds" || bad "front /" "no response"
+echo "== sub-path mounts =="
+for p in editor videogen township; do
+  code="$(curl -s -o /dev/null -w '%{http_code}' "http://127.0.0.1:$PORT/$p/" || true)"
+  case "$code" in 200|301|302|307) ok "/$p/ ($code)";; *) bad "/$p/" "http $code";; esac
+done
+echo "== bundled binaries on PATH =="
+for b in ffmpeg ffprobe vulkaninfo nginx supervisord whisper-server ds4-server wav2lip sadtalker lspci; do
+  if "${DK[@]}" exec "$NAME" sh -lc "command -v $b >/dev/null 2>&1"; then ok "bin: $b"; else bad "bin: $b" "missing"; fi
+done
+echo "== ds4 seeded on the cache volume =="
+if "${DK[@]}" exec "$NAME" sh -lc "test -x /cache/ds4/ds4-server"; then ok "/cache/ds4/ds4-server"; else bad "/cache/ds4/ds4-server" "missing"; fi
+echo "== shared lip-sync venv (py3.10 + torch) =="
+if "${DK[@]}" exec "$NAME" /opt/coderai/lipsync_venv/bin/python -c "import torch,sys; print(sys.version.split()[0], torch.__version__)" >/dev/null 2>&1; then
+  ok "lipsync venv imports torch"
+else
+  bad "lipsync venv" "python/torch import failed"
+fi
+# Repo code is bundled; weights are NOT (download on first lip-sync use).
+if "${DK[@]}" exec "$NAME" sh -lc "test -f /opt/coderai/Wav2Lip/inference.py && test -f /opt/coderai/SadTalker/inference.py"; then
+  ok "lip-sync repo code present"
+else
+  bad "lip-sync repo code" "missing"
+fi
+echo "== parler overlay present =="
+if "${DK[@]}" exec "$NAME" sh -lc "test -d /opt/coderai/parler-venv/site-packages"; then ok "parler overlay"; else bad "parler overlay" "missing"; fi
+echo
+if [ "$fails" = 0 ]; then echo "SMOKE TEST PASSED"; else echo "SMOKE TEST: $fails failure(s)"; "${DK[@]}" logs "$NAME" 2>&1 | tail -30; fi
+exit "$fails"
--- a/packaging/linux/supervisord.conf
+++ b/packaging/linux/supervisord.conf
+; Process supervisor for the CoderAI distributable image.
+; Starts nginx (public :8776) plus the main server and the bundled tool web UIs,
+; all bound to localhost behind nginx. Logs go to stdout/stderr so `docker logs`
+; shows everything.
+[supervisord]
+nodaemon=true
+logfile=/dev/null
+logfile_maxbytes=0
+; pid + control socket under /tmp so the container runs as root OR `--user UID`.
+pidfile=/tmp/supervisord.pid
+[unix_http_server]
+file=/tmp/supervisor.sock
+[supervisorctl]
+serverurl=unix:///tmp/supervisor.sock
+[rpcinterface:supervisor]
+supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
+[program:coderai]
+; The OCI launcher seeds /config and binds the main server to localhost:18776.
+command=/usr/local/bin/coderai
+environment=CODERAI_HOST="127.0.0.1",CODERAI_PORT="18776"
+autostart=true
+autorestart=true
+startsecs=5
+stopwaitsecs=30
+priority=10
+; Signal the whole process group so the front's engine subprocesses (and the
+; optional `tee` used for file logging) stop/kill together with the launcher.
+stopasgroup=true
+killasgroup=true
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
+[program:nginx]
+command=/usr/sbin/nginx -c /etc/nginx/nginx.conf
+autostart=true
+autorestart=true
+startsecs=3
+priority=20
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
+[program:video_editor]
+command=/usr/local/bin/with-env /opt/coderai/python/bin/python3 /opt/coderai/app/tools/video_editor.py
+    --no-browser --host 127.0.0.1 --port 8420
+    --base-url http://127.0.0.1:18776
+directory=/opt/coderai/app
+autostart=true
+autorestart=true
+startsecs=5
+startretries=5
+priority=30
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
+[program:videogen]
+command=/usr/local/bin/with-env /opt/coderai/python/bin/python3 /opt/coderai/app/tools/videogen.py
+    --host 127.0.0.1 --web-port 7790
+    --base-url http://127.0.0.1:18776
+    --out-dir /cache/videogen_output
+directory=/opt/coderai/app
+autostart=true
+autorestart=true
+startsecs=5
+startretries=5
+priority=30
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
+[program:township]
+command=/usr/local/bin/with-env /opt/coderai/python/bin/python3 /opt/coderai/app/tools/gen_township_fighters.py
+    --web-port 7788
+    --base-url http://127.0.0.1:18776
+    --out-dir /cache/township_output
+directory=/opt/coderai/app
+autostart=true
+autorestart=true
+startsecs=5
+startretries=5
+priority=30
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
+; Parler-TTS runs in its OWN bundled venv (transformers 4.46, pinned). Its
+; site-packages is prepended to PYTHONPATH so it shadows the main stack; torch and
+; the rest resolve from the standalone Python's site-packages underneath — exactly
+; the local --system-site-packages layering. Internal-only (not proxied by nginx);
+; coderai reaches it via a TTS model config { "service_url": "http://127.0.0.1:8123" }.
+; Disabled by default: set autostart=true (or start it from supervisorctl) once a
+; parler model is configured. Won't be fatal if the model isn't present.
+[program:parler]
+command=/usr/local/bin/with-env /opt/coderai/python/bin/python3 /opt/coderai/app/tools/parler_tts_service.py
+    --model %(ENV_CODERAI_PARLER_MODEL)s --port 8123
+environment=PYTHONPATH="/opt/coderai/parler-venv/site-packages"
+directory=/opt/coderai/app
+autostart=false
+autorestart=true
+startsecs=10
+startretries=3
+priority=40
+stdout_logfile=/dev/fd/1
+stdout_logfile_maxbytes=0
+redirect_stderr=true
--- a/packaging/linux/update_oci_image.sh
+++ b/packaging/linux/update_oci_image.sh
+#!/usr/bin/env bash
+# Fast incremental image update: re-layer ONLY the coderai app code + launcher
+# scripts + service configs on top of an already-built image. No 20 GB bundle
+# recopy — seconds, not the ~15 min of a full build_oci_image.sh run.
+#
+# It keeps an immutable `coderai:base` tag (the heavy bundle) and rebuilds the
+# shipped `coderai:dist` as base + a thin app layer. Because every update starts
+# from the SAME base, app layers never stack up over repeated updates.
+#
+# Usage:
+#   [DOCKER="sudo docker"] ./update_oci_image.sh
+#   BASE_IMAGE=coderai:base TAG=coderai:dist DOCKER="sudo docker" ./update_oci_image.sh
+#
+# First run seeds coderai:base from the current coderai:dist. To re-baseline the
+# bundle (new venv/libs/tools), run build_oci_image.sh and then:
+#   docker rmi coderai:base   # drop the stale base; next update re-seeds it
+set -euo pipefail
+HERE="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$HERE/../.." && pwd)"
+DOCKER_BIN="${DOCKER:-docker}"
+read -r -a DK <<< "$DOCKER_BIN"
+BASE_IMAGE="${BASE_IMAGE:-coderai:base}"
+TAG="${TAG:-coderai:dist}"
+SEED_FROM="${SEED_FROM:-coderai:dist}"
+img_exists(){ "${DK[@]}" image inspect "$1" >/dev/null 2>&1; }
+# Seed the immutable base from a previously built full image if it doesn't exist.
+if ! img_exists "$BASE_IMAGE"; then
+  if img_exists "$SEED_FROM"; then
+    echo "== seeding immutable base '$BASE_IMAGE' from '$SEED_FROM' =="
+    "${DK[@]}" tag "$SEED_FROM" "$BASE_IMAGE"
+  else
+    echo "Base '$BASE_IMAGE' and seed '$SEED_FROM' both missing." >&2
+    echo "Run packaging/linux/build_oci_image.sh for a full build first." >&2
+    exit 1
+  fi
+fi
+echo "== updating '$TAG' from base '$BASE_IMAGE' (app code only) =="
+t0=$(date +%s)
+"${DK[@]}" build \
+  -f "$HERE/Dockerfile.update" \
+  --build-arg BASE_IMAGE="$BASE_IMAGE" \
+  -t "$TAG" "$REPO_ROOT"
+echo "== done in $(( $(date +%s) - t0 ))s: '$TAG' (base '$BASE_IMAGE' unchanged) =="
+echo "   Tip: 'docker image prune -f' to drop the now-dangling previous '$TAG' layer."