-
Stefy Lanza (nextime / spora ) authored
Two issues when unloading/reporting models on a multi-engine node: - Unload didn't free VRAM for pooled models. api_model_unload only popped multi_model_manager.models and never touched model_pools, so a model served with max_instances>1 (which lives only in the pool) kept all its instances resident. Now it searches both dicts and calls unload_model(), which cleans up the whole pool + runs gc/empty_cache. Also handles whisper-server models (their own subprocess) by stopping the server. - whisper-server showed as "not loaded". It runs as a subprocess tracked in whisper_servers, not in .models. Fold each running server (id + `audio:` alias) into both the model-loaded-status list and the /admin/api/status loaded_keys, so the models page, dashboard count and per-engine box all reflect it (incl. on a secondary engine). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
84def90a