• Stefy Lanza (nextime / spora )'s avatar
    admin: actually free VRAM on unload + show whisper-server as loaded · 84def90a
    Stefy Lanza (nextime / spora ) authored
    Two issues when unloading/reporting models on a multi-engine node:
    
    - Unload didn't free VRAM for pooled models. api_model_unload only popped
      multi_model_manager.models and never touched model_pools, so a model
      served with max_instances>1 (which lives only in the pool) kept all its
      instances resident. Now it searches both dicts and calls unload_model(),
      which cleans up the whole pool + runs gc/empty_cache. Also handles
      whisper-server models (their own subprocess) by stopping the server.
    
    - whisper-server showed as "not loaded". It runs as a subprocess tracked
      in whisper_servers, not in .models. Fold each running server (id +
      `audio:` alias) into both the model-loaded-status list and the
      /admin/api/status loaded_keys, so the models page, dashboard count and
      per-engine box all reflect it (incl. on a secondary engine).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    84def90a
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
frontproxy Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...