codai/admin/templates/models.html · a019905fd3c05ab19857cd76520532e4fae6cdb5 · nexlab / coderai

feat: per-model auto-compact of the conversation context (off by default) · a019905f

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

When enabled for a model, if the prompt would exceed auto_compact_pct% of the
model's context window, the conversation is shrunk to ~65% before generation
instead of erroring on overflow. Per-model config (auto_compact / auto_compact_pct
/ auto_compact_strategy) with three strategies:
  - drop_oldest    : keep system messages + the most recent turns that fit.
  - keep_head_tail : also keep the first user turn as an anchor + a count note.
  - summarize      : replace the dropped middle with a best-effort LLM summary
                     (generated by the loaded model; falls back to a count note).

Token size is a cheap chars/4 estimate; membership uses object identity so
value-equal turns don't collide. Wired into the chat path (codai/api/text.py),
the model-configure whitelist, and the model config modal UI.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

a019905f

models.html 203 KB

Replace models.html