codai/admin/routes.py · 6a153c581b008192f6bc53aedbb77736de8d5a4a · nexlab / coderai

feat(ds4): auto-route deepseek4 GGUFs by architecture; serve the requested file · 6a153c58

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

- Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read
  from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs
  stay on llama.cpp; the model_id alias still routes for the download case.
- ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a
  local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the
  managed service per file). No fixed-variant assumption.
- Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx).
- New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream
  MoE experts from SSD/disk), model_path (explicit -m override), and
  auto_download (OFF by default — only serve GGUFs already present; error clearly
  instead of silently pulling tens of GB; opt in to fetch model_variant).
- AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml
  ops) → ds4 for now; and ds4 routing/offload/text-only specifics.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

6a153c58

routes.py 173 KB

Replace routes.py