codai/backends · ce9c294383bafc0e479168645f9484a902387786 · nexlab / coderai

ds4: per-model launch overrides, multibyte-safe streaming, UI tweaks · 6a111627

Stefy Lanza (nextime / spora ) authored Jun 20, 2026

Per-model ds4 tuning (these vary by quant/size/context, so they belong on the
model, not globally):
- Optional `ds4` block on a model entry overrides the global Ds4Config for
  ssd_streaming / expert_cache_reserve_gb / extra_args / extra_env; unset fields
  inherit the global config (the default/template). Ds4Backend looks up its own
  model entry and applies the overrides via dataclasses.replace.
- admin: api_model_configure accepts + normalizes the per-model `ds4` block,
  dropping it when empty.
- models page: a "ds4 streaming" section shown only when ds4 is enabled globally
  and the model is a deepseek4; n_ctx stays the context knob.

Fix garbled / truncated ds4 replies: the streaming reader used
iter_lines(decode_unicode=True), which decodes each network chunk independently
and corrupts a multibyte UTF-8 char split across chunks ('—' -> 'â'); the broken
JSON then made json.loads fail and the token was silently dropped (truncated
tails). Parse the SSE byte stream and split on the b"\n" byte (never inside a
UTF-8 sequence), decoding whole lines; also flush a final newline-less line.

UI: slow-reply notice reworded to "Waiting for model reply..." with a trailing
newline so the real reply starts on its own line.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

6a111627

Name	Last commit	Last update
..
__init__.py		Loading commit data...
base.py		Loading commit data...
cuda.py		Loading commit data...
ds4.py		Loading commit data...
vulkan.py		Loading commit data...