Commit 362b8452 authored by Your Name's avatar Your Name

Implement on-demand model swapping for multiple models

- Add model_backend_types dict to track backend for each model
- Update set_default_model to accept backend_type parameter
- Modify get_model_for_request to swap models on-demand when in ondemand mode
- Unload current model from VRAM and load new model when request arrives for different model
- Respect --backend flag when loading models on-demand
- Only activates when no --loadall or --loadswap flag is specified
parent ebfa6892
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment