Commit 766fef3c authored by Stefy Lanza (nextime / spora )'s avatar Stefy Lanza (nextime / spora )

Merge feat/township-match-upload: to-download list, mmproj vision, styled...

Merge feat/township-match-upload: to-download list, mmproj vision, styled modals, broker + packaging
parents 56291911 cbf7f147
...@@ -21,6 +21,18 @@ township_output ...@@ -21,6 +21,18 @@ township_output
dist dist
dist-package dist-package
*.log *.log
tmp
debug.log
CoderAI.gif
# Produced artifacts and tool session/output dirs (mounted as volumes at runtime,
# never baked into the image)
video_editor/sessions
video_editor.config.json
tools/videogen_output
tools/township_output
tools/coderai_media
samples
# Build outputs # Build outputs
build build
......
...@@ -17,6 +17,15 @@ __pycache__/ ...@@ -17,6 +17,15 @@ __pycache__/
# Debug logs # Debug logs
debug.log debug.log
/logs/
# Runtime model cache (downloads, self-quantized checkpoints, job state).
# Root-anchored so it never shadows the tracked codai/models/ source package.
/models/
# Third-party source clone of the GPTQ quantizer — installed into the venv from
# source; the working tree is not part of this repo (it has its own .git).
/GPTQModel/
# Test files # Test files
test_*.py test_*.py
...@@ -33,3 +42,11 @@ township_output/ ...@@ -33,3 +42,11 @@ township_output/
# Packaging build cache + runtime temp (large artifacts) # Packaging build cache + runtime temp (large artifacts)
.packaging-cache/ .packaging-cache/
tmp/ tmp/
# Exported image tarballs + local OCI run-state (large artifacts)
dist/
coderai-runtime/
# Video editor sessions + generated media (runtime artifacts)
video_editor/sessions/
tools/coderai_media/
...@@ -286,3 +286,67 @@ safe. ...@@ -286,3 +286,67 @@ safe.
14. Thermal protection is config-driven and model-agnostic (config.json 14. Thermal protection is config-driven and model-agnostic (config.json
`thermal`). Don't special-case it per model/backend; it only reads temps and `thermal`). Don't special-case it per model/backend; it only reads temps and
sleeps. Honour the enable flags and high/resume hysteresis. sleeps. Honour the enable flags and high/resume hysteresis.
================================================================================
## Distributable Docker image (packaging/linux)
================================================================================
All-in-one image: coderai + tools (editor/videogen/township) behind nginx on a
single port (8776), built from the LOCAL install's venv + binaries.
Multi-stage `Dockerfile.oci-venv`:
- assembler stage stages the local bundle into /opt/coderai (python-build-
standalone interpreter + venv site-packages + ldd'd native libs + parler
overlay + lip-sync venv/repos + py310 + ds4). The ~20 GB bundle COPY lives
ONLY here; the runtime stage COPYs the assembled tree ONCE (no double-store).
- runtime stage: apt (nginx/supervisor/vulkan-tools/ffmpeg/...), COPY the
assembled /opt/coderai, then COPY app code → /opt/coderai/app, launchers →
/usr/local/bin, nginx/supervisor confs. Entry = coderai-entrypoint →
supervisord (nginx + main server + tool UIs).
- Do NOT set PYTHONHOME globally (breaks the system-python supervisord); set
PATH only. Bundle dereferences host symlinks (cp -aL) so binaries like
whisper-server are real files in the image, not dangling links.
Full build (slow, ~15 min — rebuilds the bundle):
packaging/linux/build_oci_image.sh # tags coderai:dist
Smoke test (no weights, checks services + every bundled binary):
DOCKER="sudo docker" GPU="--gpus all" PORT=18082 \
packaging/linux/smoke_test_services.sh coderai:dist
Run against your LIVE local config + data (no rebuild — pure bind-mounts):
packaging/linux/run_oci.sh --nvidia --local \
--map /AI/guffcache --map /AI/huggingface --map /AI/offloads
- The image launcher reads config from /config/coderai and runs
`coderai --config /config/coderai`, rewriting server.host/port in config.json.
- `--local` (= --config-dir ~/.coderai) copies ONLY the *.json config files to
a temp dir and mounts it at /config/coderai, so your real config is untouched
(use --inplace-config to edit it directly).
- `--map HOST[:CONT]` bind-mounts a host dir at the SAME path inside the
container so the ABSOLUTE paths in models.json/config.json (gguf/hf caches,
offloads) resolve unchanged. Without these maps the models won't be found.
- `--debug[=SPEC]` runs coderai with --debug* flags (SPEC default 'all';
e.g. `--debug=engine,requests,ws` --debug-engine/--debug-requests/--debug-ws,
`--debug` always auto-added) and writes a host-tailable file log. `--log-file
PATH` sets the in-container log path (default /cache/logs/coderai.log host
under the cache mount). Driven by env CODERAI_DEBUG + CODERAI_LOG_FILE, read
by the coderai-oci launcher, which tees output so `docker logs` still works.
supervisord [program:coderai] uses stopasgroup/killasgroup so the front's
engine subprocesses + the tee are torn down together. NOTE: the launcher +
supervisord.conf are baked in, so changes need a (fast) update_oci_image.sh.
Incremental update (FAST, ~30 s — code-only changes, NO bundle recopy):
DOCKER="sudo docker" packaging/linux/update_oci_image.sh
- `Dockerfile.update` is `FROM coderai:base` and re-layers ONLY the app code +
launchers + service confs. The heavy bundle layers are inherited unchanged.
- Keeps an immutable `coderai:base` (the bundle) and rebuilds `coderai:dist`
as base + a thin app layer. Every update starts from the SAME base, so app
layers never stack across updates. dist and base SHARE the bundle layers —
keeping both costs only the app layer (a few MB), not a second 23 GB.
- First run seeds coderai:base from the current coderai:dist (docker tag).
- Re-baseline the bundle (new venv/libs/tools): run build_oci_image.sh, then
`docker rmi coderai:base` so the next update re-seeds it from the new dist.
- Use this whenever ONLY codai/ app code (or launchers/confs) changed — a full
build_oci_image.sh is wasteful for that.
- CAUTION: COPY adds/overwrites but does NOT delete files removed from the
repo; the cleanup RUN prunes only known-stale paths (.git/venv*/dist/...). A
source file deleted from codai/ lingers in the overlay until a full rebuild.
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
![CoderAI](CoderAI.gif) ![CoderAI](CoderAI.gif)
An OpenAI-compatible API server to run models on your local GPU with web administration dashboard, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support. A multimodal and multi-backend local model orchestrator with an OpenAI-compatible API server to run models on local GPUs, supporting multiple GPU backends: NVIDIA (CUDA), AMD (Vulkan), and Intel (Vulkan). Configuration-driven architecture with per-model settings and full multi-modal support.
## Features ## Features
......
...@@ -35,6 +35,7 @@ BACKEND="${1:-all}" ...@@ -35,6 +35,7 @@ BACKEND="${1:-all}"
FLASH=false FLASH=false
CUSTOM_VENV="" CUSTOM_VENV=""
PACKAGE=false PACKAGE=false
DS4=false
# Parse arguments # Parse arguments
i=1 i=1
...@@ -50,6 +51,9 @@ for arg in "$@"; do ...@@ -50,6 +51,9 @@ for arg in "$@"; do
--package) --package)
PACKAGE=true PACKAGE=true
;; ;;
--ds4)
DS4=true
;;
esac esac
i=$((i + 1)) i=$((i + 1))
done done
...@@ -68,6 +72,7 @@ if [[ "$BACKEND" != "nvidia" && "$BACKEND" != "vulkan" && "$BACKEND" != "vulkan- ...@@ -68,6 +72,7 @@ if [[ "$BACKEND" != "nvidia" && "$BACKEND" != "vulkan" && "$BACKEND" != "vulkan-
echo "" echo ""
echo "Options:" echo "Options:"
echo " --flash - Install Flash Attention 2 for faster inference (NVIDIA only)" echo " --flash - Install Flash Attention 2 for faster inference (NVIDIA only)"
echo " --ds4 - Clone + build the ds4 (DeepSeek V4) native engine"
exit 1 exit 1
fi fi
...@@ -755,6 +760,35 @@ package_app() { ...@@ -755,6 +760,35 @@ package_app() {
echo -e "${YELLOW}Note: The target machine must still provide compatible system GPU/runtime libraries.${NC}" echo -e "${YELLOW}Note: The target machine must still provide compatible system GPU/runtime libraries.${NC}"
} }
# Optionally clone + build ds4 (DeepSeek V4 native engine). Opt-in via --ds4.
# coderai can also auto-build this at runtime on first use, but doing it here lets
# the OCI/Docker packaging bundle the prebuilt ds4-server binary.
build_ds4() {
local DS4_DIR="${CODERAI_DS4_DIR:-$HOME/.coderai/ds4}"
echo -e "${YELLOW}Building ds4 (DeepSeek V4 engine) → $DS4_DIR ...${NC}"
if [ ! -e "$DS4_DIR/Makefile" ]; then
mkdir -p "$(dirname "$DS4_DIR")"
git clone --depth 1 https://github.com/antirez/ds4 "$DS4_DIR" || {
echo -e "${YELLOW}Warning: could not clone ds4; skipping.${NC}"; return 0; }
fi
local TARGET="cpu"
if command -v nvcc &> /dev/null || [ -d "/usr/local/cuda" ]; then
TARGET="cuda-generic"
elif [ "$(uname -s)" = "Darwin" ]; then
TARGET="" # bare `make` builds the macOS Metal backend
fi
( cd "$DS4_DIR" && make $TARGET ) || {
echo -e "${YELLOW}Warning: ds4 build failed; it can still be built at runtime.${NC}"; return 0; }
if [ -x "$DS4_DIR/ds4-server" ]; then
echo -e "${GREEN}✓ ds4-server built at $DS4_DIR/ds4-server${NC}"
echo -e "${YELLOW}Note: DeepSeek V4 weights are downloaded on first use (multi-GB).${NC}"
fi
}
if [ "$DS4" = true ]; then
build_ds4
fi
# Create .backend file to track which backend was used # Create .backend file to track which backend was used
echo "$BACKEND" > .backend echo "$BACKEND" > .backend
......
This diff is collapsed.
...@@ -335,7 +335,7 @@ async function deleteEntry() { ...@@ -335,7 +335,7 @@ async function deleteEntry() {
closeDetail(); closeDetail();
loadArchive(); loadArchive();
} catch(e) { } catch(e) {
alert('Delete failed: ' + e.message); showAlert('Delete failed: ' + e.message);
} }
} }
......
...@@ -104,6 +104,81 @@ function donateCopy(id, btn) { ...@@ -104,6 +104,81 @@ function donateCopy(id, btn) {
</main> </main>
{% endif %} {% endif %}
<!-- Shared confirm / notice modal (replaces window.confirm / window.alert) -->
<div id="confirm-modal" class="modal" onclick="if(event.target===this)document.getElementById('confirm-modal-cancel').click()">
<div class="modal-box" style="max-width:420px">
<div class="modal-head">
<span class="modal-title" id="confirm-modal-title">Confirm</span>
<button class="modal-close" id="confirm-modal-x">&times;</button>
</div>
<div class="modal-body">
<p id="confirm-modal-msg" style="margin:0 0 1.25rem;white-space:pre-wrap"></p>
<div style="display:flex;gap:.5rem;justify-content:flex-end">
<button class="btn btn-ghost" id="confirm-modal-cancel">Cancel</button>
<button class="btn btn-danger" id="confirm-modal-ok">Confirm</button>
</div>
</div>
</div>
</div>
<script>
// Global modal helpers, shared by every admin page. Defined here so templates
// can call showAlert()/showConfirm() instead of window.alert()/window.confirm().
if(typeof window.openModal!=='function') window.openModal=function(id){document.getElementById(id).classList.add('show')};
if(typeof window.closeModal!=='function') window.closeModal=function(id){document.getElementById(id).classList.remove('show')};
window.showConfirm=function(title, msg, okLabel){
return new Promise(resolve => {
document.getElementById('confirm-modal-title').textContent = title;
document.getElementById('confirm-modal-msg').textContent = msg;
const okBtn = document.getElementById('confirm-modal-ok');
const cancelBtn= document.getElementById('confirm-modal-cancel');
const xBtn = document.getElementById('confirm-modal-x');
okBtn.className = 'btn btn-danger';
okBtn.textContent = okLabel || 'Confirm';
cancelBtn.style.display = '';
openModal('confirm-modal');
function cleanup(result){
closeModal('confirm-modal');
okBtn.removeEventListener('click', onOk);
cancelBtn.removeEventListener('click', onCancel);
xBtn.removeEventListener('click', onCancel);
resolve(result);
}
function onOk(){ cleanup(true); }
function onCancel(){ cleanup(false); }
okBtn.addEventListener('click', onOk);
cancelBtn.addEventListener('click', onCancel);
xBtn.addEventListener('click', onCancel);
});
};
// Styled replacement for window.alert(): a single-button notice modal.
window.showAlert=function(msg, title, kind){
return new Promise(resolve => {
if(!title && !kind && /^\s*(error|failed|cannot|could not)\b/i.test(String(msg||''))) kind = 'error';
document.getElementById('confirm-modal-title').textContent =
title || (kind === 'error' ? 'Error' : 'Notice');
document.getElementById('confirm-modal-msg').textContent = msg;
const okBtn = document.getElementById('confirm-modal-ok');
const cancelBtn = document.getElementById('confirm-modal-cancel');
const xBtn = document.getElementById('confirm-modal-x');
okBtn.className = 'btn btn-primary';
okBtn.textContent = 'OK';
cancelBtn.style.display = 'none';
openModal('confirm-modal');
function cleanup(){
closeModal('confirm-modal');
cancelBtn.style.display = '';
okBtn.removeEventListener('click', onOk);
xBtn.removeEventListener('click', onOk);
resolve();
}
function onOk(){ cleanup(); }
okBtn.addEventListener('click', onOk);
xBtn.addEventListener('click', onOk);
});
};
</script>
{% block scripts %}{% endblock %} {% block scripts %}{% endblock %}
</body> </body>
</html> </html>
...@@ -2372,7 +2372,7 @@ const STUDIO_CAPABILITIES = { ...@@ -2372,7 +2372,7 @@ const STUDIO_CAPABILITIES = {
optional:[], optional:[],
notes:[ notes:[
'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.', 'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.',
'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).', 'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="' + (window.ROOT_PATH||'') + '/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
'No AI model selection needed — this feature uses its own dedicated backend.', 'No AI model selection needed — this feature uses its own dedicated backend.',
], ],
backendPath: ROOT_PATH + '/v1/images/faceswap', backendPath: ROOT_PATH + '/v1/images/faceswap',
...@@ -2386,7 +2386,7 @@ const STUDIO_CAPABILITIES = { ...@@ -2386,7 +2386,7 @@ const STUDIO_CAPABILITIES = {
optional:[], optional:[],
notes:[ notes:[
'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.', 'Requires <code>insightface</code> and <code>onnxruntime</code>: <code>pip install insightface onnxruntime</code>.',
'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).', 'The <b>inswapper_128.onnx</b> model is <b>auto-downloaded</b> from HuggingFace on first use (<a href="' + (window.ROOT_PATH||'') + '/admin/models?tab=search&q=inswapper&pipeline=&gguf=no-gguf" class="cap-find-link">deepinsight/inswapper<span class="cap-find-icon">↗</span></a>).',
'No AI model selection needed — this feature uses its own dedicated backend.', 'No AI model selection needed — this feature uses its own dedicated backend.',
], ],
backendPath: ROOT_PATH + '/v1/images/faceswap', backendPath: ROOT_PATH + '/v1/images/faceswap',
...@@ -2461,14 +2461,14 @@ function capSearchUrl(cap) { ...@@ -2461,14 +2461,14 @@ function capSearchUrl(cap) {
const s = CAP_TO_HF_SEARCH[cap]; const s = CAP_TO_HF_SEARCH[cap];
if (!s) return null; if (!s) return null;
const p = new URLSearchParams({ tab:'search', q: s.q, pipeline: s.pipeline, gguf: s.gguf }); const p = new URLSearchParams({ tab:'search', q: s.q, pipeline: s.pipeline, gguf: s.gguf });
return '/admin/models?' + p.toString(); return (window.ROOT_PATH || '') + '/admin/models?' + p.toString();
} }
function capMissingHtml(caps, label) { function capMissingHtml(caps, label) {
if (!caps.length) return ''; if (!caps.length) return '';
const links = caps.map(cap => { const links = caps.map(cap => {
const chip = `<span class="cap-chip dim">${cap.replace(/_/g,' ')}</span>`; const chip = `<span class="cap-chip dim">${cap.replace(/_/g,' ')}</span>`;
if (_localCapSet.has(cap)) { if (_localCapSet.has(cap)) {
const url = `/admin/models?local_cap=${encodeURIComponent(cap)}`; const url = `${window.ROOT_PATH || ''}/admin/models?local_cap=${encodeURIComponent(cap)}`;
return `<a href="${url}" class="cap-find-link" title="You have a local model with ${cap.replace(/_/g,' ')} — click to configure it">${chip}<span class="cap-find-icon" style="color:#6ecf7e">↑ configure</span></a>`; return `<a href="${url}" class="cap-find-link" title="You have a local model with ${cap.replace(/_/g,' ')} — click to configure it">${chip}<span class="cap-find-icon" style="color:#6ecf7e">↑ configure</span></a>`;
} }
const url = capSearchUrl(cap); const url = capSearchUrl(cap);
...@@ -4229,12 +4229,12 @@ async function loadCharProfileIntoSlot(prefix, idx, name) { ...@@ -4229,12 +4229,12 @@ async function loadCharProfileIntoSlot(prefix, idx, name) {
charSlots[prefix][idx].name = charSlots[prefix][idx].name || d.name; charSlots[prefix][idx].name = charSlots[prefix][idx].name || d.name;
charSlots[prefix][idx].images = (d.images||[]).map(img => img.data); charSlots[prefix][idx].images = (d.images||[]).map(img => img.data);
renderCharSlots(prefix); renderCharSlots(prefix);
} catch(e) { alert('Failed to load profile: '+e.message); } } catch(e) { showAlert('Failed to load profile: '+e.message); }
} }
async function saveCharSlotAsProfile(prefix, idx) { async function saveCharSlotAsProfile(prefix, idx) {
const slot = charSlots[prefix]?.[idx]; const slot = charSlots[prefix]?.[idx];
if (!slot || !slot.images.length) { alert('Add at least one image first.'); return; } if (!slot || !slot.images.length) { showAlert('Add at least one image first.'); return; }
const name = slot.name || prompt('Profile name:'); const name = slot.name || prompt('Profile name:');
if (!name) return; if (!name) return;
try { try {
...@@ -4246,8 +4246,8 @@ async function saveCharSlotAsProfile(prefix, idx) { ...@@ -4246,8 +4246,8 @@ async function saveCharSlotAsProfile(prefix, idx) {
charSlots[prefix][idx].name = name; charSlots[prefix][idx].name = name;
await loadCharProfileList(); await loadCharProfileList();
renderCharSlots(prefix); renderCharSlots(prefix);
alert(`Saved profile "${name}"`); showAlert(`Saved profile "${name}"`);
} catch(e) { alert('Save failed: '+e.message); } } catch(e) { showAlert('Save failed: '+e.message); }
} }
// ───────────────────────────────────────────────────────────────── // ─────────────────────────────────────────────────────────────────
...@@ -6051,14 +6051,14 @@ async function profCharView(name) { ...@@ -6051,14 +6051,14 @@ async function profCharView(name) {
try { try {
const d = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name)).then(r=>r.json()); const d = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name)).then(r=>r.json());
_openProfModal(`Character: ${d.name}`, d.description||'', d.images||[]); _openProfModal(`Character: ${d.name}`, d.description||'', d.images||[]);
} catch(e) { alert('Failed to load character: ' + e.message); } } catch(e) { showAlert('Failed to load character: ' + e.message); }
} }
async function profCharDelete(name) { async function profCharDelete(name) {
if (!confirm(`Delete character profile "${name}"?`)) return; if (!confirm(`Delete character profile "${name}"?`)) return;
const r = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name), {method:'DELETE'}); const r = await fetch(ROOT_PATH + '/admin/api/characters/'+encodeURIComponent(name), {method:'DELETE'});
if (r.ok) await profCharLoad(); if (r.ok) await profCharLoad();
else alert('Delete failed: ' + await r.text()); else showAlert('Delete failed: ' + await r.text());
} }
...@@ -6139,7 +6139,7 @@ async function profVoiceDelete(name) { ...@@ -6139,7 +6139,7 @@ async function profVoiceDelete(name) {
if (!confirm(`Delete voice profile "${name}"?`)) return; if (!confirm(`Delete voice profile "${name}"?`)) return;
const r = await fetch(ROOT_PATH + '/admin/api/voices/'+encodeURIComponent(name), {method:'DELETE'}); const r = await fetch(ROOT_PATH + '/admin/api/voices/'+encodeURIComponent(name), {method:'DELETE'});
if (r.ok) await profVoiceLoad(); if (r.ok) await profVoiceLoad();
else alert('Delete failed: ' + await r.text()); else showAlert('Delete failed: ' + await r.text());
} }
// ───────────────────────────────────────────────────────────────── // ─────────────────────────────────────────────────────────────────
...@@ -6296,14 +6296,14 @@ async function profEnvView(name) { ...@@ -6296,14 +6296,14 @@ async function profEnvView(name) {
try { try {
const d = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name)).then(r=>r.json()); const d = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name)).then(r=>r.json());
_openProfModal(`Environment: ${d.name}`, d.description||'', d.images||[]); _openProfModal(`Environment: ${d.name}`, d.description||'', d.images||[]);
} catch(e) { alert('Failed to load environment: ' + e.message); } } catch(e) { showAlert('Failed to load environment: ' + e.message); }
} }
async function profEnvDelete(name) { async function profEnvDelete(name) {
if (!confirm(`Delete environment profile "${name}"?`)) return; if (!confirm(`Delete environment profile "${name}"?`)) return;
const r = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name), {method:'DELETE'}); const r = await fetch(ROOT_PATH + '/admin/api/environments/'+encodeURIComponent(name), {method:'DELETE'});
if (r.ok) await profEnvLoad(); if (r.ok) await profEnvLoad();
else alert('Delete failed: ' + await r.text()); else showAlert('Delete failed: ' + await r.text());
} }
// ───────────────────────────────────────────────────────────────── // ─────────────────────────────────────────────────────────────────
...@@ -6528,7 +6528,7 @@ async function deleteCustomPipeline(id) { ...@@ -6528,7 +6528,7 @@ async function deleteCustomPipeline(id) {
_customPipelines = _customPipelines.filter(p => p.id !== id); _customPipelines = _customPipelines.filter(p => p.id !== id);
if (_editingPipelineId === id) { _editingPipelineId = null; _pbSteps = []; renderBuilderSteps(); } if (_editingPipelineId === id) { _editingPipelineId = null; _pbSteps = []; renderBuilderSteps(); }
renderCustomPipelineCards(); renderCustomPipelineCards();
} catch(e) { alert('Delete failed: '+e.message); } } catch(e) { showAlert('Delete failed: '+e.message); }
} }
function _renderPipelineResult(outId, progId, d) { function _renderPipelineResult(outId, progId, d) {
...@@ -6683,7 +6683,7 @@ async function archiveDelete(filename) { ...@@ -6683,7 +6683,7 @@ async function archiveDelete(filename) {
_archiveFiles = _archiveFiles.filter(f => f.filename !== filename); _archiveFiles = _archiveFiles.filter(f => f.filename !== filename);
renderArchive(); renderArchive();
} catch(e) { } catch(e) {
alert('Delete failed: ' + e.message); showAlert('Delete failed: ' + e.message);
} }
} }
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -126,15 +126,15 @@ async function createToken() { ...@@ -126,15 +126,15 @@ async function createToken() {
openModal('show-modal'); openModal('show-modal');
loadTokens(); loadTokens();
} else { } else {
const e = await r.json(); alert(e.detail || 'Failed'); const e = await r.json(); showAlert(e.detail || 'Failed');
} }
} catch (e) { alert(e.message); } } catch (e) { showAlert(e.message); }
} }
async function delToken(id) { async function delToken(id) {
if (!confirm('Delete this token? Clients using it will lose access immediately.')) return; if (!confirm('Delete this token? Clients using it will lose access immediately.')) return;
const r = await fetch(ROOT_PATH + '/admin/api/tokens/'+id, {method:'DELETE'}); const r = await fetch(ROOT_PATH + '/admin/api/tokens/'+id, {method:'DELETE'});
if (r.ok) loadTokens(); else alert('Failed to delete'); if (r.ok) loadTokens(); else showAlert('Failed to delete');
} }
loadTokens(); loadTokens();
......
...@@ -105,7 +105,7 @@ async function delUser(id, name) { ...@@ -105,7 +105,7 @@ async function delUser(id, name) {
if (!confirm('Delete user "' + name + '"?')) return; if (!confirm('Delete user "' + name + '"?')) return;
const r = await fetch(ROOT_PATH + '/admin/api/users/'+id, {method:'DELETE'}); const r = await fetch(ROOT_PATH + '/admin/api/users/'+id, {method:'DELETE'});
if (r.ok) location.reload(); if (r.ok) location.reload();
else { const e = await r.json(); alert(e.detail || 'Failed'); } else { const e = await r.json(); showAlert(e.detail || 'Failed'); }
} }
</script> </script>
{% endblock %} {% endblock %}
...@@ -160,6 +160,32 @@ except ImportError: ...@@ -160,6 +160,32 @@ except ImportError:
pass pass
class _InternalAuthMiddleware:
"""Reject any HTTP request that doesn't carry the front's internal token.
Active only when CODERAI_INTERNAL_TOKEN is set (i.e. this process is an engine
spawned by the front). It binds 127.0.0.1, but this also blocks anything else on
localhost from talking to the engine directly and bypassing the front. In
single-process mode the token is unset and this is a no-op."""
def __init__(self, app):
self._app = app
self._token = os.environ.get("CODERAI_INTERNAL_TOKEN")
async def __call__(self, scope, receive, send):
if self._token and scope.get("type") == "http":
headers = dict(scope.get("headers", []))
got = headers.get(b"x-coderai-internal", b"").decode("latin-1")
if got != self._token:
await send({"type": "http.response.start", "status": 403,
"headers": [(b"content-type", b"application/json")]})
await send({"type": "http.response.body",
"body": b'{"error":"forbidden: engines are reachable only '
b'through the front proxy"}'})
return
await self._app(scope, receive, send)
class _ForwardedPrefixMiddleware: class _ForwardedPrefixMiddleware:
"""Populate ASGI root_path from X-Forwarded-Prefix / X-Script-Name headers.""" """Populate ASGI root_path from X-Forwarded-Prefix / X-Script-Name headers."""
...@@ -180,6 +206,9 @@ class _ForwardedPrefixMiddleware: ...@@ -180,6 +206,9 @@ class _ForwardedPrefixMiddleware:
app.add_middleware(_ForwardedPrefixMiddleware) app.add_middleware(_ForwardedPrefixMiddleware)
# Added last → outermost: the internal-token gate runs before anything else, so a
# request without the front's token never reaches a route.
app.add_middleware(_InternalAuthMiddleware)
# Mount static files for admin dashboard # Mount static files for admin dashboard
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
...@@ -193,6 +222,77 @@ from fastapi.responses import FileResponse, Response as _FaviconResponse ...@@ -193,6 +222,77 @@ from fastapi.responses import FileResponse, Response as _FaviconResponse
_favicon_path = admin_static_dir / "favicon.ico" _favicon_path = admin_static_dir / "favicon.ico"
@app.get("/healthz", include_in_schema=False)
async def healthz():
"""Cheap liveness probe that touches no torch/model state.
The front proxy's engine supervisor polls this to distinguish a *slow* engine
(busy loading a model — the event loop may be blocked, so this can be late but
will eventually answer) from a *dead* one (connection refused). It must stay
trivial and dependency-free so it returns the instant the loop is free."""
import os as _os
return {"ok": True, "pid": _os.getpid()}
@app.get("/internal/engine-state", include_in_schema=False)
async def internal_engine_state():
"""Auth-free engine introspection for the front proxy's router/aggregator.
Engines bind 127.0.0.1 only, so this is not publicly reachable. Returns which
models are resident (for model→engine routing) and this engine's GPU/VRAM (for
cross-engine status aggregation). Kept cheap so it answers even mid-generation.
"""
import os as _os
try:
loaded = list(multi_model_manager.models.keys())
except Exception:
loaded = []
vram = None
try:
import torch
if torch.cuda.is_available():
# Sum across every CUDA device this engine can see — an engine may own
# more than one GPU (e.g. two NVIDIA cards sharding one large model), so
# reporting only device 0 would under-count its VRAM.
n = torch.cuda.device_count()
used = free = total = 0
devs = []
for i in range(n):
f, t = torch.cuda.mem_get_info(i)
used += (t - f); free += f; total += t
devs.append({"index": i, "name": torch.cuda.get_device_name(i),
"free": round(f / 1e9, 2), "total": round(t / 1e9, 2)})
label = (torch.cuda.get_device_name(0) if n == 1
else f"{n}× CUDA")
vram = {"used": round(used / 1e9, 2), "free": round(free / 1e9, 2),
"total": round(total / 1e9, 2), "gpu": label,
"devices": devs, "device_count": n}
except Exception:
vram = None
# Running tasks so the front can show cross-engine activity without needing a
# session on this engine (sessions live only on the primary).
tasks = []
try:
from codai.tasks import task_registry
tasks = [t for t in task_registry.list()
if t.get("status") in ("running", "queued", "paused")]
except Exception:
tasks = []
# This engine's thermal cooldown state, so the front can show WHICH engine is
# cooling (each engine pauses on its own GPUs; CPU pauses everything).
cooling = None
try:
from codai.models import thermal
cs = thermal.get_cooldown_state()
if cs.get("active"):
cooling = {"gpu": cs.get("gpu"), "cpu": cs.get("cpu"),
"message": cs.get("message")}
except Exception:
cooling = None
return {"ok": True, "pid": _os.getpid(), "loaded_models": loaded,
"vram": vram, "tasks": tasks, "cooling": cooling}
@app.get("/favicon.ico", include_in_schema=False) @app.get("/favicon.ico", include_in_schema=False)
async def favicon(): async def favicon():
if _favicon_path.exists(): if _favicon_path.exists():
......
This diff is collapsed.
...@@ -106,6 +106,27 @@ async def create_embeddings(request: EmbeddingsRequest, http_request: Request = ...@@ -106,6 +106,27 @@ async def create_embeddings(request: EmbeddingsRequest, http_request: Request =
""" """
OpenAI-compatible embeddings endpoint. OpenAI-compatible embeddings endpoint.
""" """
# Register a task so embeddings appear in the unified task list, like every
# other model type. Finished on success or error below.
from codai.tasks import task_registry
_title = request.input if isinstance(request.input, str) else "embeddings"
_tid = task_registry.register(
"embedding", title=str(_title)[:80], model=(request.model or "embedding"))
task_registry.start(_tid)
try:
_resp = await _run_embeddings(request, http_request)
task_registry.finish(_tid, "done")
return _resp
except HTTPException:
task_registry.finish(_tid, "error")
raise
except Exception as e:
task_registry.finish(_tid, "error", str(e)[:200])
raise
async def _run_embeddings(request: EmbeddingsRequest, http_request: Request = None):
"""Core embeddings logic; registered as a task by create_embeddings()."""
model_info = await asyncio.to_thread( model_info = await asyncio.to_thread(
multi_model_manager.request_model, request.model, model_type="embedding") multi_model_manager.request_model, request.model, model_type="embedding")
model_name = model_info.get('model_name') model_name = model_info.get('model_name')
......
# CoderAI - OpenAI-compatible API server
# Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
"""Fully-managed Parler-TTS worker.
parler-tts pins an old transformers/tokenizers/huggingface-hub that conflict with
the coderai server's stack, so it can't share this venv. Instead coderai owns the
whole lifecycle here: on first use it bootstraps a dedicated venv (installing
parler-tts), launches ``tools/parler_tts_service.py`` in it as a local HTTP
service, health-checks it, and hands back the URL. The matching
``_RemoteParlerBackend.cleanup()`` calls :func:`stop_service`, so the model
manager's normal eviction tears the process down — no manual setup or config.
"""
import os
import socket
import subprocess
import sys
import threading
import time
from pathlib import Path
_REPO_ROOT = Path(__file__).resolve().parents[2]
_SERVICE_SCRIPT = _REPO_ROOT / "tools" / "parler_tts_service.py"
# Dedicated venv for the (incompatible) parler-tts stack. Created with access to
# the base interpreter's packages so torch/numpy aren't re-downloaded; parler's
# pinned transformers installs into the venv and shadows the system one.
_VENV_DIR = Path(os.environ.get("CODERAI_PARLER_VENV")
or os.path.expanduser("~/.coderai/parler_venv"))
_lock = threading.RLock()
_services: dict[str, dict] = {} # model_name -> {"proc","port","url"}
_bootstrapped = False
def _venv_python() -> Path:
return _VENV_DIR / ("Scripts" if os.name == "nt" else "bin") / (
"python.exe" if os.name == "nt" else "python")
def _pip_ok(py: Path) -> bool:
try:
return subprocess.run([str(py), "-c", "import parler_tts, soundfile"],
capture_output=True).returncode == 0
except Exception:
return False
def _venv_is_system_site() -> bool:
"""True if the venv was built with --system-site-packages (can't isolate)."""
try:
return "include-system-site-packages = true" in \
(_VENV_DIR / "pyvenv.cfg").read_text().lower()
except Exception:
return False
def _bootstrap_venv() -> Path:
"""Create a fully-isolated venv and install parler-tts (idempotent).
Isolation is the whole point: parler-tts pins an old transformers/tokenizers
that must NOT be shared with — or shadowed by — the server's stack, so the
venv gets its own copy of everything (torch included). Returns its python."""
global _bootstrapped
py = _venv_python()
if _bootstrapped and py.exists():
return py
# A previously-created shared-site venv leaks the server's transformers in;
# rebuild it isolated.
if py.exists() and _venv_is_system_site():
import shutil
print("[parler] rebuilding venv as fully isolated …", flush=True)
shutil.rmtree(_VENV_DIR, ignore_errors=True)
if not _venv_python().exists():
print(f"[parler] creating isolated venv at {_VENV_DIR} …", flush=True)
_VENV_DIR.parent.mkdir(parents=True, exist_ok=True)
subprocess.run([sys.executable, "-m", "venv", str(_VENV_DIR)], check=True)
py = _venv_python()
if not _pip_ok(py):
print("[parler] installing parler-tts + torch into the isolated venv "
"(first run, downloads several GB, this can take a while) …", flush=True)
subprocess.run([str(py), "-m", "pip", "install",
"git+https://github.com/huggingface/parler-tts.git",
"soundfile"], check=True)
if not _pip_ok(py):
raise RuntimeError("parler-tts install did not yield an importable package")
_bootstrapped = True
return py
def _free_port() -> int:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("127.0.0.1", 0))
port = s.getsockname()[1]
s.close()
return port
def _pump_logs(proc: subprocess.Popen, tail):
for line in proc.stdout:
line = line.rstrip()
if line:
tail.append(line)
print(f"[parler] {line}", flush=True)
def _health_ok(url: str) -> bool:
import requests
try:
r = requests.get(url + "/health", timeout=3)
return r.ok and bool(r.json().get("ok"))
except Exception:
return False
def ensure_service(model_name: str, ready_timeout: float = 1800.0) -> str:
"""Start (or reuse) the worker for ``model_name`` and return its base URL.
First call bootstraps the venv and downloads the model, so the timeout is
generous. Raises RuntimeError if the service never comes up."""
with _lock:
svc = _services.get(model_name)
if svc and svc["proc"].poll() is None and _health_ok(svc["url"]):
return svc["url"]
if svc and svc["proc"].poll() is not None:
_services.pop(model_name, None) # died — restart below
py = _bootstrap_venv()
port = _free_port()
url = f"http://127.0.0.1:{port}"
env = dict(os.environ)
# The worker must use the model already pulled via coderai's HF download
# interface — it never downloads anything itself. Point it at coderai's
# cache and force offline mode, so a missing model fails fast instead of
# silently fetching.
try:
from codai.models.cache import get_hf_hub_cache_dir
hub = get_hf_hub_cache_dir()
env["HF_HUB_CACHE"] = hub
env["HUGGINGFACE_HUB_CACHE"] = hub
except Exception:
pass
env["HF_HUB_OFFLINE"] = "1"
env["TRANSFORMERS_OFFLINE"] = "1"
proc = subprocess.Popen(
[str(py), str(_SERVICE_SCRIPT), "--model", model_name,
"--host", "127.0.0.1", "--port", str(port)],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
bufsize=1, env=env, cwd=str(_REPO_ROOT),
)
import collections
tail = collections.deque(maxlen=15)
threading.Thread(target=_pump_logs, args=(proc, tail), daemon=True).start()
_services[model_name] = {"proc": proc, "port": port, "url": url}
def _tail_msg():
joined = " | ".join(list(tail)[-5:]).strip()
if "offline" in joined.lower() or "not" in joined.lower() and "found" in joined.lower():
return (f". The model isn't in coderai's cache — download "
f"'{model_name}' from the model interface first. ({joined})")
return f". Last output: {joined}" if joined else ""
# Wait (outside the lock) for the service to load the model and answer.
deadline = time.time() + ready_timeout
while time.time() < deadline:
if proc.poll() is not None:
raise RuntimeError(
f"Parler worker exited (code {proc.returncode}) before becoming ready"
+ _tail_msg())
if _health_ok(url):
print(f"[parler] service ready for {model_name} at {url}", flush=True)
return url
time.sleep(2)
stop_service(model_name)
raise RuntimeError(f"Parler worker for {model_name} did not become ready in time"
+ _tail_msg())
def stop_service(model_name: str) -> None:
with _lock:
svc = _services.pop(model_name, None)
if not svc:
return
proc = svc["proc"]
if proc.poll() is None:
try:
proc.terminate()
proc.wait(timeout=10)
except Exception:
pass
if proc.poll() is None:
try:
proc.kill()
except Exception:
pass
print(f"[parler] service for {model_name} stopped", flush=True)
def stop_all() -> None:
for name in list(_services.keys()):
stop_service(name)
import atexit as _atexit
_atexit.register(stop_all)
...@@ -45,6 +45,31 @@ global_args = None ...@@ -45,6 +45,31 @@ global_args = None
global_file_path = None global_file_path = None
def _spatial_task(title: str):
"""Decorator: register a spatial/3D endpoint in the unified task list so
every model type is visible there. Finishes done/error around the call."""
import functools
def deco(fn):
@functools.wraps(fn)
async def wrap(*args, **kwargs):
from codai.tasks import task_registry
tid = task_registry.register("spatial", title=title, model="spatial")
task_registry.start(tid)
try:
result = await fn(*args, **kwargs)
task_registry.finish(tid, "done")
return result
except HTTPException:
task_registry.finish(tid, "error")
raise
except Exception as e:
task_registry.finish(tid, "error", str(e)[:200])
raise
return wrap
return deco
def set_global_args(args): def set_global_args(args):
global global_args global global_args
global_args = args global_args = args
...@@ -500,6 +525,7 @@ class ImageTo3DRequest(BaseModel): ...@@ -500,6 +525,7 @@ class ImageTo3DRequest(BaseModel):
@router.post("/v1/images/to3d", summary="Image to 3D model") @router.post("/v1/images/to3d", summary="Image to 3D model")
@_spatial_task("Image → 3D")
async def image_to_3d(request: ImageTo3DRequest, http_request: Request = None): async def image_to_3d(request: ImageTo3DRequest, http_request: Request = None):
"""Convert a 2D image to a 3D representation. """Convert a 2D image to a 3D representation.
...@@ -568,6 +594,7 @@ class ImageFrom3DRequest(BaseModel): ...@@ -568,6 +594,7 @@ class ImageFrom3DRequest(BaseModel):
@router.post("/v1/images/from3d", summary="Render a 3D model to an image") @router.post("/v1/images/from3d", summary="Render a 3D model to an image")
@_spatial_task("3D → image")
async def image_from_3d(request: ImageFrom3DRequest, http_request: Request = None): async def image_from_3d(request: ImageFrom3DRequest, http_request: Request = None):
"""Render a 3D model (GLB/OBJ) to a 2D PNG image from a specified camera angle.""" """Render a 3D model (GLB/OBJ) to a 2D PNG image from a specified camera angle."""
raw = _decode_b64(request.model_data) raw = _decode_b64(request.model_data)
...@@ -601,6 +628,7 @@ class VideoTo3DRequest(BaseModel): ...@@ -601,6 +628,7 @@ class VideoTo3DRequest(BaseModel):
@router.post("/v1/video/to3d", summary="Video to 3D model") @router.post("/v1/video/to3d", summary="Video to 3D model")
@_spatial_task("Video → 3D")
async def video_to_3d(request: VideoTo3DRequest, http_request: Request = None): async def video_to_3d(request: VideoTo3DRequest, http_request: Request = None):
"""Convert a 2D video to a 3D video frame-by-frame. """Convert a 2D video to a 3D video frame-by-frame.
...@@ -642,6 +670,7 @@ class VideoFrom3DRequest(BaseModel): ...@@ -642,6 +670,7 @@ class VideoFrom3DRequest(BaseModel):
@router.post("/v1/video/from3d", summary="Render a 3D model to a video") @router.post("/v1/video/from3d", summary="Render a 3D model to a video")
@_spatial_task("3D → video")
async def video_from_3d(request: VideoFrom3DRequest, http_request: Request = None): async def video_from_3d(request: VideoFrom3DRequest, http_request: Request = None):
"""Render a 3D model as a 360° turntable video.""" """Render a 3D model as a 360° turntable video."""
raw = _decode_b64(request.model_data) raw = _decode_b64(request.model_data)
...@@ -675,6 +704,7 @@ class Generate3DRequest(BaseModel): ...@@ -675,6 +704,7 @@ class Generate3DRequest(BaseModel):
@router.post("/v1/3d/generate", summary="Generate a 3D model from a prompt") @router.post("/v1/3d/generate", summary="Generate a 3D model from a prompt")
@_spatial_task("Generate 3D")
async def generate_3d(request: Generate3DRequest, http_request: Request = None): async def generate_3d(request: Generate3DRequest, http_request: Request = None):
"""Generate a 3D model (GLB) from a text prompt and/or an image. """Generate a 3D model (GLB) from a text prompt and/or an image.
......
This diff is collapsed.
...@@ -135,6 +135,32 @@ async def create_transcription( ...@@ -135,6 +135,32 @@ async def create_transcription(
if len(file_content) > _MAX_AUDIO_BYTES: if len(file_content) > _MAX_AUDIO_BYTES:
raise HTTPException(status_code=413, detail="Audio file too large (max 100 MB)") raise HTTPException(status_code=413, detail="Audio file too large (max 100 MB)")
# Register a task so transcription appears in the unified task list, like
# every other model type. Finished on success or error below.
from codai.tasks import task_registry
_tid = task_registry.register(
"transcription",
title=(file.filename or "audio")[:80],
model=model or "",
)
task_registry.start(_tid)
try:
_resp = await _run_transcription(
file_content, model, language, prompt, response_format, temperature, file)
task_registry.finish(_tid, "done")
return _resp
except HTTPException:
task_registry.finish(_tid, "error")
raise
except Exception as e:
task_registry.finish(_tid, "error", str(e)[:200])
raise
async def _run_transcription(
file_content: bytes, model: str, language, prompt, response_format, temperature, file
):
"""Core transcription logic; registered as a task by create_transcription()."""
# Check if the requested model maps to a configured whisper-server instance first. # Check if the requested model maps to a configured whisper-server instance first.
# Try alias round-robin resolution before direct ID lookup. # Try alias round-robin resolution before direct ID lookup.
whisper_model_id = multi_model_manager.resolve_whisper_alias_model_id(model) whisper_model_id = multi_model_manager.resolve_whisper_alias_model_id(model)
......
...@@ -28,6 +28,7 @@ from pydantic import BaseModel, ConfigDict ...@@ -28,6 +28,7 @@ from pydantic import BaseModel, ConfigDict
# Import from codai modules # Import from codai modules
from codai.models.manager import multi_model_manager from codai.models.manager import multi_model_manager
from codai.api import tts_backends
# Global reference to be set by coderai # Global reference to be set by coderai
...@@ -40,6 +41,20 @@ def set_global_args(args): ...@@ -40,6 +41,20 @@ def set_global_args(args):
global_args = args global_args = args
# Substrings that mark a model as a text/classifier/embedding model wrongly routed
# to TTS (e.g. an emotion classifier exposed under a stray ``tts:`` alias).
_NON_TTS_HINTS = (
"go_emotions", "roberta", "bert", "embedding", "e5-", "minilm",
"classifier", "toxic", "reranker", "sentence-transformers",
)
def _family_is_text_model(model_name: str) -> bool:
"""Heuristic guard: True when the model is clearly not a speech synthesizer."""
n = (model_name or "").lower()
return any(h in n for h in _NON_TTS_HINTS)
# ============================================================================= # =============================================================================
# Router and Endpoints # Router and Endpoints
# ============================================================================= # =============================================================================
...@@ -72,6 +87,16 @@ async def create_speech(request: TTSRequest, http_request: Request = None): ...@@ -72,6 +87,16 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
Supports: Supports:
- Kokoro TTS models (when --tts-model is specified) - Kokoro TTS models (when --tts-model is specified)
""" """
# Register a task so TTS shows up in the unified task list / dashboard,
# like every other model type. Finished on success or error below.
from codai.tasks import task_registry, loading_task
_tid = task_registry.register(
"tts",
title=(request.input or "")[:80],
model=(request.model or request.voice_profile or "tts"),
)
task_registry.start(_tid)
try:
# If a voice profile is requested, delegate to voice cloning (F5-TTS) # If a voice profile is requested, delegate to voice cloning (F5-TTS)
if request.voice_profile: if request.voice_profile:
from codai.api.voice_clone import _load_voice, _f5tts_clone from codai.api.voice_clone import _load_voice, _f5tts_clone
...@@ -96,6 +121,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None): ...@@ -96,6 +121,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
except Exception as e: except Exception as e:
raise HTTPException(status_code=500, detail=f"Voice cloning failed: {e}") raise HTTPException(status_code=500, detail=f"Voice cloning failed: {e}")
audio_base64 = base64.b64encode(audio_bytes).decode('utf-8') audio_base64 = base64.b64encode(audio_bytes).decode('utf-8')
task_registry.finish(_tid, "done")
return {"audio": audio_base64} return {"audio": audio_base64}
# Use the manager to resolve the model and manage VRAM # Use the manager to resolve the model and manage VRAM
...@@ -111,7 +137,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None): ...@@ -111,7 +137,7 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
model_name = model_info['model_name'] model_name = model_info['model_name']
model_key = model_info['model_key'] model_key = model_info['model_key']
kokoro_model = model_info['model_object'] tts_backend = model_info['model_object']
# If no TTS model configured, return an error # If no TTS model configured, return an error
if not model_name: if not model_name:
...@@ -120,35 +146,42 @@ async def create_speech(request: TTSRequest, http_request: Request = None): ...@@ -120,35 +146,42 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
detail="TTS not configured. Use --tts-model to specify a model." detail="TTS not configured. Use --tts-model to specify a model."
) )
# Try to use kokoro if available # Reject text/classifier models that aren't actually speech synthesizers.
try: if _family_is_text_model(model_name):
from kokoro import Kokoro raise HTTPException(
status_code=404,
detail=(f"Model '{model_name}' is a text model and cannot be used for "
"tts generation. Use a TTS model (e.g. a kokoro/XTTS/Bark model).")
)
if kokoro_model is None: try:
print(f"Loading Kokoro TTS model: {model_name}") from codai.api import tts_backends
# Check if model_name is a URL - download it (with caching) if tts_backend is None:
model_path = None print(f"Loading TTS model: {model_name}")
if model_name.startswith('http://') or model_name.startswith('https://'):
print(f"Loading model from URL: {model_name}")
from codai.models.cache import load_model
model_path = load_model(model_name)
if not model_path:
raise Exception(f"Failed to load model from {model_name}")
else:
# Use local path or model name
model_path = model_name model_path = model_name
if model_name.startswith(('http://', 'https://')):
# Load the Kokoro model from codai.models.cache import load_model
kokoro_model = Kokoro(model_path if model_path else model_name) model_path = load_model(model_name) or model_name
multi_model_manager.add_model(model_key, kokoro_model) cfg = multi_model_manager.config.get(model_key) or \
multi_model_manager.config.get(f"tts:{model_name}") or {}
with loading_task(model_name, model_type="tts"):
tts_backend = await asyncio.to_thread(
tts_backends.load_backend, model_name, model_path, cfg)
multi_model_manager.add_model(model_key, tts_backend)
multi_model_manager.current_model_key = model_key multi_model_manager.current_model_key = model_key
# Generate speech voice = request.voice or getattr(tts_backend, "default_voice", "")
voice = request.voice or "af_sarah"
speed = request.speed or 1.0 speed = request.speed or 1.0
lang = getattr(request, "language", None) or "en-us"
emotion = getattr(request, "emotion", None) or ""
style = getattr(request, "style", None) or ""
fmt = request.response_format or "wav"
audio_bytes = kokoro_model.generate(request.input, voice=voice, speed=speed) samples, sample_rate = await asyncio.to_thread(
tts_backend.synthesize, request.input, voice, speed, lang, emotion, style)
audio_bytes, out_fmt = await asyncio.to_thread(
tts_backends.encode_audio, samples, sample_rate, fmt)
try: try:
from codai.api.archive import archive_manager from codai.api.archive import archive_manager
...@@ -157,27 +190,29 @@ async def create_speech(request: TTSRequest, http_request: Request = None): ...@@ -157,27 +190,29 @@ async def create_speech(request: TTSRequest, http_request: Request = None):
"tts", "/v1/audio/speech", "tts", "/v1/audio/speech",
model_name, model_name,
request.input, request.input,
{"voice": voice, "speed": speed, "response_format": request.response_format}, {"voice": voice, "speed": speed, "response_format": out_fmt},
[(audio_bytes, request.response_format or "mp3")], [(audio_bytes, out_fmt)],
)) ))
except Exception: except Exception:
pass pass
# Convert to base64
audio_base64 = base64.b64encode(audio_bytes).decode('utf-8') audio_base64 = base64.b64encode(audio_bytes).decode('utf-8')
task_registry.finish(_tid, "done")
return {"audio": audio_base64}
return { except HTTPException:
"audio": audio_base64 raise
} except tts_backends.MissingEngineError as e:
# Missing optional engine (e.g. coqui-tts) → actionable 501.
except ImportError as e: raise HTTPException(status_code=501, detail=str(e))
# kokoro not installed
raise HTTPException(
status_code=501,
detail=f"TTS not available. Install kokoro: pip install kokoro. Error: {str(e)}"
)
except Exception as e: except Exception as e:
print(f"TTS error: {e}") print(f"TTS error: {e}")
import traceback import traceback
traceback.print_exc() traceback.print_exc()
raise HTTPException(status_code=500, detail=f"TTS error: {str(e)}") raise HTTPException(status_code=500, detail=f"TTS error: {str(e)}")
except HTTPException:
task_registry.finish(_tid, "error")
raise
except Exception as e:
task_registry.finish(_tid, "error", str(e)[:200])
raise
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -49,7 +49,13 @@ def build_hardware_summary() -> Dict[str, Any]: ...@@ -49,7 +49,13 @@ def build_hardware_summary() -> Dict[str, Any]:
total_vram_mb = 0 total_vram_mb = 0
available_vram_mb = 0 available_vram_mb = 0
# Only use torch if it's ALREADY loaded (i.e. we're in an engine). Never import
# it here — the front is torch-free and must stay that way (importing torch in
# the front is heavy and would initialise CUDA in the wrong process).
import sys as _sys
try: try:
if "torch" not in _sys.modules:
raise ImportError("torch not loaded (front) — using torch-free path")
import torch import torch
if torch.cuda.is_available(): if torch.cuda.is_available():
...@@ -76,6 +82,23 @@ def build_hardware_summary() -> Dict[str, Any]: ...@@ -76,6 +82,23 @@ def build_hardware_summary() -> Dict[str, Any]:
except Exception: except Exception:
pass pass
# Torch-free path (e.g. the front, which imports no torch): enumerate every
# physical card via nvidia-smi + sysfs so VRAM is reported for the whole node.
if not gpus:
try:
from codai.frontproxy.gpu_detect import gpu_stats
for c in gpu_stats():
total_mb = int(round((c.get("mem_total") or 0) * 1024))
used_mb = int(round((c.get("mem_used") or 0) * 1024))
if total_mb <= 0:
continue
gpus.append({"name": c.get("name") or c.get("vendor"),
"total_vram_mb": total_mb})
total_vram_mb += total_mb
available_vram_mb += max(0, total_mb - used_mb)
except Exception:
pass
if not gpus: if not gpus:
for total_path in sorted(glob.glob("/sys/class/drm/card*/device/mem_info_vram_total")): for total_path in sorted(glob.glob("/sys/class/drm/card*/device/mem_info_vram_total")):
used_path = total_path.replace("vram_total", "vram_used") used_path = total_path.replace("vram_total", "vram_used")
......
...@@ -60,8 +60,13 @@ def _is_text_response(content_type: str | None) -> bool: ...@@ -60,8 +60,13 @@ def _is_text_response(content_type: str | None) -> bool:
) )
async def execute_broker_request(app, envelope): async def execute_broker_request(app, envelope, executor=None):
"""Validate and execute a broker request envelope.""" """Validate and execute a broker request envelope.
``executor`` is an ``async (method, path, headers, query, body) -> {status_code,
headers, body}`` callable. When omitted the request is run in-process against
``app`` via the ASGI bridge (engine / single-process mode). The front passes its
own executor that proxies to the right engine over HTTP."""
logger.debug( logger.debug(
"broker dispatch → op=%s request_id=%s path=%r method=%r stream=%s", "broker dispatch → op=%s request_id=%s path=%r method=%r stream=%s",
...@@ -136,6 +141,12 @@ async def execute_broker_request(app, envelope): ...@@ -136,6 +141,12 @@ async def execute_broker_request(app, envelope):
headers["content-type"] = envelope.content_type headers["content-type"] = envelope.content_type
started_at = perf_counter() started_at = perf_counter()
if executor is not None:
response = await executor(
method=envelope.method, path=envelope.path, headers=headers,
query=envelope.query, body=body,
)
else:
response = await execute_internal_request( response = await execute_internal_request(
app, app,
method=envelope.method, method=envelope.method,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -54,6 +54,7 @@ class Task: ...@@ -54,6 +54,7 @@ class Task:
status: str = "queued" # queued | running | done | error | cancelled status: str = "queued" # queued | running | done | error | cancelled
step: int = 0 step: int = 0
total: int = 0 total: int = 0
rate: float = 0.0 # throughput (tokens/s) for text generation
message: str = "" message: str = ""
job_id: Optional[str] = None # link to a durable loras training job, if any job_id: Optional[str] = None # link to a durable loras training job, if any
created_at: float = field(default_factory=time.time) created_at: float = field(default_factory=time.time)
......
python tools/video_editor.py --no-browser --host 0.0.0.0 --media-dir tools/coderai_media --session
tools/gen_township_fighters.py -c township_output/township_config.json
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment