• Stefy Lanza (nextime / spora )'s avatar
    feat(ds4): auto-route deepseek4 GGUFs by architecture; serve the requested file · 6a153c58
    Stefy Lanza (nextime / spora ) authored
    - Route to ds4 by GGUF ARCHITECTURE (general.architecture == "deepseek4"), read
      from the file header (cached) — not by filename. Mainline deepseek/2/3/32 GGUFs
      stay on llama.cpp; the model_id alias still routes for the download case.
    - ds4-server now serves the REQUESTED GGUF: Ds4Backend resolves the model to a
      local .gguf and launches `ds4-server -m <file>` (resolve_service_key keys the
      managed service per file). No fixed-variant assumption.
    - Honour the model's per-entry n_ctx for ds4-server --ctx (over the global ctx).
    - New config.ds4 options + settings UI: ssd_streaming (--ssd-streaming, stream
      MoE experts from SSD/disk), model_path (explicit -m override), and
      auto_download (OFF by default — only serve GGUFs already present; error clearly
      instead of silently pulling tens of GB; opt in to fetch model_variant).
    - AI.PROMPT: document DeepSeek-V4 = pending upstream llama.cpp PRs (needs new ggml
      ops) → ds4 for now; and ds4 routing/offload/text-only specifics.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    6a153c58
Name
Last commit
Last update
..
archive.html Loading commit data...
base.html Loading commit data...
change_password.html Loading commit data...
chat.html Loading commit data...
dashboard.html Loading commit data...
login.html Loading commit data...
models.html Loading commit data...
settings.html Loading commit data...
tasks.html Loading commit data...
tokens.html Loading commit data...
users.html Loading commit data...