codai · 913e283aeea99c2894448b6cebaf59e7373bd42d · nexlab / coderai

text: make auto-compaction actually fire — fix config lookup + max_tokens-aware layered trimming · 913e283a

Stefy Lanza (nextime / spora ) authored Jun 20, 2026

Auto-compaction never triggered: multi_model_manager.config stores the
whitelisted build_runtime_kwargs() dict, which drops the per-model
auto_compact* keys (they survive only under _raw_cfg), so _resolve_compaction
always read the global default (False) and returned None. Read the keys via a
_raw_cfg fallback so per-model compaction config is honoured.

Also rework the over-context handling to count the reply reservation, since the
reply is generated into the same window (prompt + max_tokens <= n_ctx). Four
layers, cheapest first:
  1. fits as-is              -> nothing
  2. overflow within tol     -> trim max_tokens to fit (lossless)
  3. beyond tol & big prompt -> compact history (drop/summarize)
  4. single message too big  -> slice it (summarize its middle, keep head/tail)

The chars/4 estimate undercounts token-dense code/JSON, so trimming to the exact
n_ctx edge could still overflow; inflate the estimate by a configurable
estimate_safety (default 1.15) for all physical-fit decisions.

New CompactionConfig knobs (per-model overridable): tolerance_pct (20),
min_output (512), estimate_safety (1.15). Effective max_tokens is threaded back
to both the streaming and non-streaming generation paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

913e283a

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
frontproxy		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...