• Stefy Lanza (nextime / spora )'s avatar
    ds4: auto-downloaded weights land in coderai GGUF cache + show on models page · ef106ba1
    Stefy Lanza (nextime / spora ) authored
    When ds4.auto_download is enabled and a deepseek4 request resolves no local
    GGUF, the downloaded weight variant is now relocated into coderai's GGUF cache
    (get_model_cache_dir; move on same FS, symlink across devices) and registered
    in models.json as a text_models entry that mimics the requested ("failed")
    model's config — backend auto, on-request, enabled and visible (removed from
    unloaded/to_download). model_name is threaded ds4 backend → ensure_service →
    ensure_model so the registration mirrors the right entry.
    
    Also: settings "Extra ds4-server args" hint/placeholder updated to reflect the
    auto --kv-disk-dir and SSD-streaming expert-cache sizing
    (--ssd-streaming-cache-experts), noting Q2_K can fail ds4's CUDA prefill.
    
    Diagnosis (no code change): ds4-server's "cuda prefill failed" on the 93GB
    Q2_K variant is a quant-specific ds4 CUDA bug — the 154GB Q4_K completes
    prefill fine (verified: "prompt done 434s" vs Q2_K instant failure), with
    15.8GB VRAM free either way (not OOM, not cache budget, not coderai).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    ef106ba1
Name
Last commit
Last update
..
__init__.py Loading commit data...
_film_net.py Loading commit data...
_rife_ifnet.py Loading commit data...
app.py Loading commit data...
archive.py Loading commit data...
audio_backends.py Loading commit data...
audio_clean.py Loading commit data...
audio_gen.py Loading commit data...
audio_stems.py Loading commit data...
characters.py Loading commit data...
custom_pipelines.py Loading commit data...
ds4_worker.py Loading commit data...
embeddings.py Loading commit data...
environments.py Loading commit data...
faceswap.py Loading commit data...
images.py Loading commit data...
log.py Loading commit data...
loras.py Loading commit data...
parler_worker.py Loading commit data...
pipelines.py Loading commit data...
prompt_cache.py Loading commit data...
ratelimit.py Loading commit data...
spatial.py Loading commit data...
state.py Loading commit data...
text.py Loading commit data...
transcriptions.py Loading commit data...
tts.py Loading commit data...
tts_backends.py Loading commit data...
urlutils.py Loading commit data...
video.py Loading commit data...
voice_clone.py Loading commit data...
voice_convert.py Loading commit data...