• Stefy Lanza (nextime / spora )'s avatar
    ds4: auto-downloaded weights land in coderai GGUF cache + show on models page · ef106ba1
    Stefy Lanza (nextime / spora ) authored
    When ds4.auto_download is enabled and a deepseek4 request resolves no local
    GGUF, the downloaded weight variant is now relocated into coderai's GGUF cache
    (get_model_cache_dir; move on same FS, symlink across devices) and registered
    in models.json as a text_models entry that mimics the requested ("failed")
    model's config — backend auto, on-request, enabled and visible (removed from
    unloaded/to_download). model_name is threaded ds4 backend → ensure_service →
    ensure_model so the registration mirrors the right entry.
    
    Also: settings "Extra ds4-server args" hint/placeholder updated to reflect the
    auto --kv-disk-dir and SSD-streaming expert-cache sizing
    (--ssd-streaming-cache-experts), noting Q2_K can fail ds4's CUDA prefill.
    
    Diagnosis (no code change): ds4-server's "cuda prefill failed" on the 93GB
    Q2_K variant is a quant-specific ds4 CUDA bug — the 154GB Q4_K completes
    prefill fine (verified: "prompt done 434s" vs Q2_K instant failure), with
    15.8GB VRAM free either way (not OOM, not cache budget, not coderai).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    ef106ba1
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
frontproxy Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...