• Stefy Lanza (nextime / spora )'s avatar
    text: surface model reasoning as a separate field (think/thinking/thought) · 0a7d343a
    Stefy Lanza (nextime / spora ) authored
    Qwen-style chat templates pre-fill the opening <think> in the prompt, so the
    model emits only the reasoning body + a bare closing </think> — and they think
    by DEFAULT regardless of the API enable_thinking flag. The old paired-tag
    reasoning extractor missed the bare close, leaking the whole thought (and the
    </think>) into content and conversation history.
    
    - extract_reasoning_content: handle a bare </think|/thinking|/thought> with no
      opening tag (treat the prefix as reasoning).
    - streaming: a chunk-safe reasoning gate routes the thought into
      delta.reasoning / reasoning_content until </think>, then flips to content;
      tool extraction runs on the post-</think> answer only.
    - non-streaming: extract reasoning, set message.reasoning(+_content), clean
      content; tools parsed from the answer.
    - activate whenever the model auto-thinks (qwen3/qwq/deepseek-r1/… name) OR
      reasoning is explicitly enabled — not just on the API flag.
    - configurable suppression: per-model `suppress_reasoning`, or per-request via
      the standard reasoning:{exclude:true} / reasoning_effort:"none" /
      suppress_reasoning fields. Emits both `reasoning` and DeepSeek-style
      `reasoning_content` for client compatibility.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    0a7d343a
Name
Last commit
Last update
..
cache Loading commit data...
__init__.py Loading commit data...
acceleration.py Loading commit data...
capabilities.py Loading commit data...
grammar.py Loading commit data...
hf_loading.py Loading commit data...
manager.py Loading commit data...
parser.py Loading commit data...
pipeline_cache.py Loading commit data...
quant.py Loading commit data...
ram_monitor.py Loading commit data...
templates.py Loading commit data...
thermal.py Loading commit data...
tmp_janitor.py Loading commit data...
tool_call_grammar.gbnf Loading commit data...
turboquant.py Loading commit data...
utils.py Loading commit data...