fix: tool-call streaming/format robustness + clear over-context error
- Streaming tool gate now withholds the gemma/qwen native `<|tool_call>` marker
(and partials) too, not just `<tool_call>`/`call:NAME{` — so the raw marker no
longer leaks to the client mid-stream (Kilo was executing partial calls).
- Normalize tool-call function.arguments from JSON string → dict before applying
the chat template, so templates that render `arguments|items` (Qwen) don't
raise "Can only get item pairs from a mapping".
- Context-window overflow now returns a meaningful error: a structured SSE error
event (code context_length_exceeded) when streaming, or HTTP 400 with a clear
message for non-streaming — instead of injecting "[Generation error: …]" as
assistant content (which polluted chat history).
- Models page: unconfigured GGUF files now expose the "Free disk" button (records
them as "to download" before deleting), matching HF models.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment