codai/models/manager.py · 99f8ba859fd0aed703d506de9c9fe865cacb90a4 · nexlab / coderai

coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85

Stefy Lanza (nextime / spora ) authored Jun 15, 2026

Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
existing VRAM budgeting:

- hf_loading clamps the accelerate CPU-offload budget to the headroom under
  the cap, so overflow spills to the disk offload folder instead of growing RSS.
- manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
  _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
  evicted before a new load when RSS nears the cap.
- ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
  climbs while the scheduler is idle, and runs a mitigation ladder
  (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
- admin /status returns a ram block; Settings page exposes max RAM + evict/
  leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.

Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
count so an active upscale no longer reports '0 models loaded'.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

99f8ba85

manager.py 158 KB

Replace manager.py