-
Stefy Lanza (nextime / spora ) authored
- Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs stay on llama.cpp; the model_id alias still routes for the download case. - ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the managed service per file). No fixed-variant assumption. - Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx). - New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream MoE experts from SSD/disk), model_path (explicit -m override), and auto_download (OFF by default — only serve GGUFs already present; error clearly instead of silently pulling tens of GB; opt in to fetch model_variant). - AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml ops) → ds4 for now; and ds4 routing/offload/text-only specifics. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
6a153c58