codai/admin/routes.py · 84def90ac5fe6bbc03a20cf469b04d635e1373a5 · nexlab / coderai

admin: actually free VRAM on unload + show whisper-server as loaded · 84def90a

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

Two issues when unloading/reporting models on a multi-engine node:

- Unload didn't free VRAM for pooled models. api_model_unload only popped
  multi_model_manager.models and never touched model_pools, so a model
  served with max_instances>1 (which lives only in the pool) kept all its
  instances resident. Now it searches both dicts and calls unload_model(),
  which cleans up the whole pool + runs gc/empty_cache. Also handles
  whisper-server models (their own subprocess) by stopping the server.

- whisper-server showed as "not loaded". It runs as a subprocess tracked
  in whisper_servers, not in .models. Fold each running server (id +
  `audio:` alias) into both the model-loaded-status list and the
  /admin/api/status loaded_keys, so the models page, dashboard count and
  per-engine box all reflect it (incl. on a secondary engine).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

84def90a

routes.py 177 KB

Replace routes.py