• Stefy Lanza (nextime / spora )'s avatar
    admin: actually free VRAM on unload + show whisper-server as loaded · 84def90a
    Stefy Lanza (nextime / spora ) authored
    Two issues when unloading/reporting models on a multi-engine node:
    
    - Unload didn't free VRAM for pooled models. api_model_unload only popped
      multi_model_manager.models and never touched model_pools, so a model
      served with max_instances>1 (which lives only in the pool) kept all its
      instances resident. Now it searches both dicts and calls unload_model(),
      which cleans up the whole pool + runs gc/empty_cache. Also handles
      whisper-server models (their own subprocess) by stopping the server.
    
    - whisper-server showed as "not loaded". It runs as a subprocess tracked
      in whisper_servers, not in .models. Fold each running server (id +
      `audio:` alias) into both the model-loaded-status list and the
      /admin/api/status loaded_keys, so the models page, dashboard count and
      per-engine box all reflect it (incl. on a secondary engine).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    84def90a
routes.py 177 KB