feat(ds4): auto-route deepseek4 GGUFs by architecture; serve the requested file
- Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read
from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs
stay on llama.cpp; the model_id alias still routes for the download case.
- ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a
local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the
managed service per file). No fixed-variant assumption.
- Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx).
- New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream
MoE experts from SSD/disk), model_path (explicit -m override), and
auto_download (OFF by default — only serve GGUFs already present; error clearly
instead of silently pulling tens of GB; opt in to fetch model_variant).
- AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml
ops) → ds4 for now; and ds4 routing/offload/text-only specifics.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment