codai/admin/templates/settings.html · ef106ba16d01884ae65e503144acd77f97b0f720 · nexlab / coderai

ds4: auto-downloaded weights land in coderai GGUF cache + show on models page · ef106ba1

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

When ds4.auto_download is enabled and a deepseek4 request resolves no local
GGUF, the downloaded weight variant is now relocated into coderai's GGUF cache
(get_model_cache_dir; move on same FS, symlink across devices) and registered
in models.json as a text_models entry that mimics the requested ("failed")
model's config — backend auto, on-request, enabled and visible (removed from
unloaded/to_download). model_name is threaded ds4 backend → ensure_service →
ensure_model so the registration mirrors the right entry.

Also: settings "Extra ds4-server args" hint/placeholder updated to reflect the
auto --kv-disk-dir and SSD-streaming expert-cache sizing
(--ssd-streaming-cache-experts), noting Q2_K can fail ds4's CUDA prefill.

Diagnosis (no code change): ds4-server's "cuda prefill failed" on the 93GB
Q2_K variant is a quant-specific ds4 CUDA bug — the 154GB Q4_K completes
prefill fine (verified: "prompt done 434s" vs Q2_K instant failure), with
15.8GB VRAM free either way (not OOM, not cache budget, not coderai).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ef106ba1

settings.html 45.9 KB

Replace settings.html