admin: actually free VRAM on unload + show whisper-server as loaded
Two issues when unloading/reporting models on a multi-engine node:
- Unload didn't free VRAM for pooled models. api_model_unload only popped
multi_model_manager.models and never touched model_pools, so a model
served with max_instances>1 (which lives only in the pool) kept all its
instances resident. Now it searches both dicts and calls unload_model(),
which cleans up the whole pool + runs gc/empty_cache. Also handles
whisper-server models (their own subprocess) by stopping the server.
- whisper-server showed as "not loaded". It runs as a subprocess tracked
in whisper_servers, not in .models. Fold each running server (id +
`audio:` alias) into both the model-loaded-status list and the
/admin/api/status loaded_keys, so the models page, dashboard count and
per-engine box all reflect it (incl. on a secondary engine).
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment