-
Stefy Lanza (nextime / spora ) authored
Qwen-style chat templates pre-fill the opening <think> in the prompt, so the model emits only the reasoning body + a bare closing </think> — and they think by DEFAULT regardless of the API enable_thinking flag. The old paired-tag reasoning extractor missed the bare close, leaking the whole thought (and the </think>) into content and conversation history. - extract_reasoning_content: handle a bare </think|/thinking|/thought> with no opening tag (treat the prefix as reasoning). - streaming: a chunk-safe reasoning gate routes the thought into delta.reasoning / reasoning_content until </think>, then flips to content; tool extraction runs on the post-</think> answer only. - non-streaming: extract reasoning, set message.reasoning(+_content), clean content; tools parsed from the answer. - activate whenever the model auto-thinks (qwen3/qwq/deepseek-r1/… name) OR reasoning is explicitly enabled — not just on the API flag. - configurable suppression: per-model `suppress_reasoning`, or per-request via the standard reasoning:{exclude:true} / reasoning_effort:"none" / suppress_reasoning fields. Emits both `reasoning` and DeepSeek-style `reasoning_content` for client compatibility. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
0a7d343a
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| static | ||
| templates | ||
| __init__.py | ||
| auth.py | ||
| download_worker.py | ||
| routes.py |