codai · ef106ba16d01884ae65e503144acd77f97b0f720 · nexlab / coderai

ds4: auto-downloaded weights land in coderai GGUF cache + show on models page · ef106ba1

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

When ds4.auto_download is enabled and a deepseek4 request resolves no local
GGUF, the downloaded weight variant is now relocated into coderai's GGUF cache
(get_model_cache_dir; move on same FS, symlink across devices) and registered
in models.json as a text_models entry that mimics the requested ("failed")
model's config — backend auto, on-request, enabled and visible (removed from
unloaded/to_download). model_name is threaded ds4 backend → ensure_service →
ensure_model so the registration mirrors the right entry.

Also: settings "Extra ds4-server args" hint/placeholder updated to reflect the
auto --kv-disk-dir and SSD-streaming expert-cache sizing
(--ssd-streaming-cache-experts), noting Q2_K can fail ds4's CUDA prefill.

Diagnosis (no code change): ds4-server's "cuda prefill failed" on the 93GB
Q2_K variant is a quant-specific ds4 CUDA bug — the 154GB Q4_K completes
prefill fine (verified: "prompt done 434s" vs Q2_K instant failure), with
15.8GB VRAM free either way (not OOM, not cache budget, not coderai).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ef106ba1

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
frontproxy		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...