codai · 99f8ba859fd0aed703d506de9c9fe865cacb90a4 · nexlab / coderai

coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85

Stefy Lanza (nextime / spora ) authored Jun 15, 2026

Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
existing VRAM budgeting:

- hf_loading clamps the accelerate CPU-offload budget to the headroom under
  the cap, so overflow spills to the disk offload folder instead of growing RSS.
- manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
  _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
  evicted before a new load when RSS nears the cap.
- ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
  climbs while the scheduler is idle, and runs a mitigation ladder
  (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
- admin /status returns a ram block; Settings page exposes max RAM + evict/
  leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.

Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
count so an active upscale no longer reports '0 models loaded'.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

99f8ba85

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...