• Stefy Lanza (nextime / spora )'s avatar
    coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85
    Stefy Lanza (nextime / spora ) authored
    Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
    existing VRAM budgeting:
    
    - hf_loading clamps the accelerate CPU-offload budget to the headroom under
      the cap, so overflow spills to the disk offload folder instead of growing RSS.
    - manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
      _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
      evicted before a new load when RSS nears the cap.
    - ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
      climbs while the scheduler is idle, and runs a mitigation ladder
      (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
    - admin /status returns a ram block; Settings page exposes max RAM + evict/
      leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.
    
    Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
    count so an active upscale no longer reports '0 models loaded'.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    99f8ba85
manager.py 158 KB