codai/models · 4754beff012b9f52ec3d4635e7e73d1af9c2105a · nexlab / coderai

text: stop runaway tool-call loops + honor client repetition penalties · a535c27f

Stefy Lanza (nextime / spora ) authored Jun 20, 2026

Some quantized fine-tunes (seen with an "Aggressive" Qwen3.6-35B Q4_K_M) collapse
into a runaway repetition loop — emitting a malformed parallel tool-call flood of
1700+ tokens that never terminates — when top_p=1.0 and no repetition penalty are
in effect (exactly the conditions Qwen's own docs warn cause endless repetitions).

Two fixes:

1. Anti-loop generation stop in stream_chat_response: a model-agnostic detector
normalises away the variable parts of the tail (quoted strings, filesystem
paths, whitespace) so a loop whose only per-cycle difference is an arg/path
still reads as periodic, then breaks generation when a short structural unit
repeats >=5x back-to-back. Tuned to not trip on prose, repetitive code, or a
legit handful of distinct tool calls.

2. Honor client-supplied repetition controls. The chat paths previously forwarded
only temperature/top_p, silently dropping repeat/presence/frequency penalty —
so a caller (e.g. Kilo) setting them per-model had no effect. Plumb them through
generate_chat_stream / generate_chat to both backends (cuda already accepts
them; vulkan now does too) with graceful signature fallbacks. Defaults are
no-ops, so unset clients are unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

a535c27f

Name	Last commit	Last update
..
cache		Loading commit data...
__init__.py		Loading commit data...
acceleration.py		Loading commit data...
capabilities.py		Loading commit data...
grammar.py		Loading commit data...
hf_loading.py		Loading commit data...
manager.py		Loading commit data...
parser.py		Loading commit data...
pipeline_cache.py		Loading commit data...
quant.py		Loading commit data...
ram_monitor.py		Loading commit data...
templates.py		Loading commit data...
thermal.py		Loading commit data...
tmp_janitor.py		Loading commit data...
tool_call_grammar.gbnf		Loading commit data...
turboquant.py		Loading commit data...
utils.py		Loading commit data...