• Stefy Lanza (nextime / spora )'s avatar
    coderai: global host-RAM cap with leak watch + disk-offload eviction · 99f8ba85
    Stefy Lanza (nextime / spora ) authored
    Add a server-wide host-RAM ceiling (OffloadConfig.max_ram_gb) alongside the
    existing VRAM budgeting:
    
    - hf_loading clamps the accelerate CPU-offload budget to the headroom under
      the cap, so overflow spills to the disk offload folder instead of growing RSS.
    - manager: process-tree RSS accounting, true-LRU (active_in_vram property stamps
      _last_used), shared _evict_one, and _evict_models_for_ram; idle models are
      evicted before a new load when RSS nears the cap.
    - ram_monitor.py: background watcher samples RSS, flags a suspected leak when it
      climbs while the scheduler is idle, and runs a mitigation ladder
      (gc -> empty_cache -> malloc_trim -> drop upscaler cache -> evict idle).
    - admin /status returns a ram block; Settings page exposes max RAM + evict/
      leak-watch toggles (applied live); dashboard shows a RAM gauge + leak badge.
    
    Also fold loaded upscalers (_UPSCALER_CACHE) into the dashboard models-loaded
    count so an active upscale no longer reports '0 models loaded'.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    99f8ba85
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...