• Stefy Lanza (nextime / spora )'s avatar
    fix: tool-call streaming/format robustness + clear over-context error · 3834ecf5
    Stefy Lanza (nextime / spora ) authored
    - Streaming tool gate now withholds the gemma/qwen native `<|tool_call>` marker
      (and partials) too, not just `<tool_call>`/`call:NAME{` — so the raw marker no
      longer leaks to the client mid-stream (Kilo was executing partial calls).
    - Normalize tool-call function.arguments from JSON string → dict before applying
      the chat template, so templates that render `arguments|items` (Qwen) don't
      raise "Can only get item pairs from a mapping".
    - Context-window overflow now returns a meaningful error: a structured SSE error
      event (code context_length_exceeded) when streaming, or HTTP 400 with a clear
      message for non-streaming — instead of injecting "[Generation error: …]" as
      assistant content (which polluted chat history).
    - Models page: unconfigured GGUF files now expose the "Free disk" button (records
      them as "to download" before deleting), matching HF models.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    3834ecf5
models.html 201 KB