coderai: global host-RAM cap with leak watch + disk-offload eviction
Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
existing VRAM budgeting:
- hf_loading clamps the accelerate CPU-offload budget to the headroom under
the cap, so overflow spills to the disk offload folder instead of growing RSS.
- manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
_last_used), shared _evict_one, and _evict_models_for_ram; idle models are
evicted before a new load when RSS nears the cap.
- ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
climbs while the scheduler is idle, and runs a mitigation ladder
(gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
- admin /status returns a ram block; Settings page exposes max RAM + evict/
leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.
Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
count so an active upscale no longer reports '0 models loaded'.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
This diff is collapsed.
codai/models/ram_monitor.py
0 → 100644
Please
register
or
sign in
to comment