feat: per-model auto-compact of the conversation context (off by default)
When enabled for a model, if the prompt would exceed auto_compact_pct% of the
model's context window, the conversation is shrunk to ~65% before generation
instead of erroring on overflow. Per-model config (auto_compact / auto_compact_pct
/ auto_compact_strategy) with three strategies:
- drop_oldest : keep system messages + the most recent turns that fit.
- keep_head_tail : also keep the first user turn as an anchor + a count note.
- summarize : replace the dropped middle with a best-effort LLM summary
(generated by the loaded model; falls back to a count note).
Token size is a cheap chars/4 estimate; membership uses object identity so
value-equal turns don't collide. Wired into the chat path (codai/api/text.py),
the model-configure whitelist, and the model config modal UI.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment