-
Stefy Lanza (nextime / spora ) authored
When enabled for a model, if the prompt would exceed auto_compact_pct% of the model's context window, the conversation is shrunk to ~65% before generation instead of erroring on overflow. Per-model config (auto_compact / auto_compact_pct / auto_compact_strategy) with three strategies: - drop_oldest : keep system messages + the most recent turns that fit. - keep_head_tail : also keep the first user turn as an anchor + a count note. - summarize : replace the dropped middle with a best-effort LLM summary (generated by the loaded model; falls back to a count note). Token size is a cheap chars/4 estimate; membership uses object identity so value-equal turns don't collide. Wired into the chat path (codai/api/text.py), the model-configure whitelist, and the model config modal UI. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
a019905f