• Stefy Lanza (nextime / spora )'s avatar
    ds4: cache-cleanup safety, in-flight gate, default-model downloader, low-quant tool parse · e23dd2a7
    Stefy Lanza (nextime / spora ) authored
    - ds4 kv janitor: a checkpoint is deleted only when ALL hold — untouched by
      max(mtime, atime) for the age (so a checkpoint ds4 merely READS, which bumps
      atime not mtime, is spared); not currently open (fd/mmap) by a ds4-server;
      and ds4 is not serving any request. New in-flight counter on Ds4Backend
      (any_request_active) gates the sweep.
    - settings: "Download a default DeepSeek V4 model" — select + button backed by
      new /admin/api/ds4/default-models catalog (q2-imatrix / q2-q4 / q4 / mtp from
      antirez/deepseek-v4-gguf). Reuses the normal downloader, which flattens the
      gguf into the cache and surfaces it in the model list; live progress.
    - parser: rescue the degraded plaintext <tool>name arg: value</tool> form that
      heavy quants (ds4 q2-imatrix) emit when they can't reproduce DSML. Scoped to
      DeepSeekParser only (never the shared ToolCallParser, so other families are
      untouched), requires a DECLARED tool name, plaintext-only inner, and the
      block(s) to be the message's trailing action — so a <tool> example inside a
      prose reply is not misread as a call.
    - settings: corrected ds4 perf note (i-quants/Q2_K fail CUDA prefill; use Q4_K+).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    e23dd2a7
ds4.py 15.4 KB