• Stefy Lanza (nextime / spora )'s avatar
    ds4: per-model launch overrides, multibyte-safe streaming, UI tweaks · 6a111627
    Stefy Lanza (nextime / spora ) authored
    Per-model ds4 tuning (these vary by quant/size/context, so they belong on the
    model, not globally):
    - Optional `ds4` block on a model entry overrides the global Ds4Config for
      ssd_streaming / expert_cache_reserve_gb / extra_args / extra_env; unset fields
      inherit the global config (the default/template). Ds4Backend looks up its own
      model entry and applies the overrides via dataclasses.replace.
    - admin: api_model_configure accepts + normalizes the per-model `ds4` block,
      dropping it when empty.
    - models page: a "ds4 streaming" section shown only when ds4 is enabled globally
      and the model is a deepseek4; n_ctx stays the context knob.
    
    Fix garbled / truncated ds4 replies: the streaming reader used
    iter_lines(decode_unicode=True), which decodes each network chunk independently
    and corrupts a multibyte UTF-8 char split across chunks ('—' -> 'â'); the broken
    JSON then made json.loads fail and the token was silently dropped (truncated
    tails). Parse the SSE byte stream and split on the b"\n" byte (never inside a
    UTF-8 sequence), decoding whole lines; also flush a final newline-less line.
    
    UI: slow-reply notice reworded to "Waiting for model reply..." with a trailing
    newline so the real reply starts on its own line.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    6a111627
Name
Last commit
Last update
..
__init__.py Loading commit data...
base.py Loading commit data...
cuda.py Loading commit data...
ds4.py Loading commit data...
vulkan.py Loading commit data...