• Stefy Lanza (nextime / spora )'s avatar
    text: surface model reasoning as a separate field (think/thinking/thought) · 0a7d343a
    Stefy Lanza (nextime / spora ) authored
    Qwen-style chat templates pre-fill the opening <think> in the prompt, so the
    model emits only the reasoning body + a bare closing </think> — and they think
    by DEFAULT regardless of the API enable_thinking flag. The old paired-tag
    reasoning extractor missed the bare close, leaking the whole thought (and the
    </think>) into content and conversation history.
    
    - extract_reasoning_content: handle a bare </think|/thinking|/thought> with no
      opening tag (treat the prefix as reasoning).
    - streaming: a chunk-safe reasoning gate routes the thought into
      delta.reasoning / reasoning_content until </think>, then flips to content;
      tool extraction runs on the post-</think> answer only.
    - non-streaming: extract reasoning, set message.reasoning(+_content), clean
      content; tools parsed from the answer.
    - activate whenever the model auto-thinks (qwen3/qwq/deepseek-r1/… name) OR
      reasoning is explicitly enabled — not just on the API flag.
    - configurable suppression: per-model `suppress_reasoning`, or per-request via
      the standard reasoning:{exclude:true} / reasoning_effort:"none" /
      suppress_reasoning fields. Emits both `reasoning` and DeepSeek-style
      `reasoning_content` for client compatibility.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    0a7d343a
Name
Last commit
Last update
..
audiogenrequest.py Loading commit data...
embedrequest.py Loading commit data...
imagerequest.py Loading commit data...
textrequest.py Loading commit data...
transcriptionrequest.py Loading commit data...
videorequest.py Loading commit data...