-
Stefy Lanza (nextime / spora ) authored
- Streaming tool gate now withholds the gemma/qwen native `<|tool_call>` marker (and partials) too, not just `<tool_call>`/`call:NAME{` — so the raw marker no longer leaks to the client mid-stream (Kilo was executing partial calls). - Normalize tool-call function.arguments from JSON string → dict before applying the chat template, so templates that render `arguments|items` (Qwen) don't raise "Can only get item pairs from a mapping". - Context-window overflow now returns a meaningful error: a structured SSE error event (code context_length_exceeded) when streaming, or HTTP 400 with a clear message for non-streaming — instead of injecting "[Generation error: …]" as assistant content (which polluted chat history). - Models page: unconfigured GGUF files now expose the "Free disk" button (records them as "to download" before deleting), matching HF models. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
3834ecf5