• Stefy Lanza (nextime / spora )'s avatar
    feat(ds4): auto-route deepseek4 GGUFs by architecture; serve the requested file · 6a153c58
    Stefy Lanza (nextime / spora ) authored
    - Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read
      from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs
      stay on llama.cpp; the model_id alias still routes for the download case.
    - ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a
      local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the
      managed service per file). No fixed-variant assumption.
    - Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx).
    - New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream
      MoE experts from SSD/disk), model_path (explicit -m override), and
      auto_download (OFF by default — only serve GGUFs already present; error clearly
      instead of silently pulling tens of GB; opt in to fetch model_variant).
    - AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml
      ops) → ds4 for now; and ds4 routing/offload/text-only specifics.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    6a153c58
manager.py 182 KB