• Stefy Lanza (nextime / spora )'s avatar
    text: stop runaway tool-call loops + honor client repetition penalties · a535c27f
    Stefy Lanza (nextime / spora ) authored
    Some quantized fine-tunes (seen with an "Aggressive" Qwen3.6-35B Q4_K_M) collapse
    into a runaway repetition loop — emitting a malformed parallel tool-call flood of
    1700+ tokens that never terminates — when top_p=1.0 and no repetition penalty are
    in effect (exactly the conditions Qwen's own docs warn cause endless repetitions).
    
    Two fixes:
    
    1. Anti-loop generation stop in stream_chat_response: a model-agnostic detector
       normalises away the variable parts of the tail (quoted strings, filesystem
       paths, whitespace) so a loop whose only per-cycle difference is an arg/path
       still reads as periodic, then breaks generation when a short structural unit
       repeats >=5x back-to-back. Tuned to not trip on prose, repetitive code, or a
       legit handful of distinct tool calls.
    
    2. Honor client-supplied repetition controls. The chat paths previously forwarded
       only temperature/top_p, silently dropping repeat/presence/frequency penalty —
       so a caller (e.g. Kilo) setting them per-model had no effect. Plumb them through
       generate_chat_stream / generate_chat to both backends (cuda already accepts
       them; vulkan now does too) with graceful signature fallbacks. Defaults are
       no-ops, so unset clients are unaffected.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    a535c27f
Name
Last commit
Last update
..
cache Loading commit data...
__init__.py Loading commit data...
acceleration.py Loading commit data...
capabilities.py Loading commit data...
grammar.py Loading commit data...
hf_loading.py Loading commit data...
manager.py Loading commit data...
parser.py Loading commit data...
pipeline_cache.py Loading commit data...
quant.py Loading commit data...
ram_monitor.py Loading commit data...
templates.py Loading commit data...
thermal.py Loading commit data...
tmp_janitor.py Loading commit data...
tool_call_grammar.gbnf Loading commit data...
turboquant.py Loading commit data...
utils.py Loading commit data...