Implement on-demand model swapping for multiple models
- Add model_backend_types dict to track backend for each model - Update set_default_model to accept backend_type parameter - Modify get_model_for_request to swap models on-demand when in ondemand mode - Unload current model from VRAM and load new model when request arrives for different model - Respect --backend flag when loading models on-demand - Only activates when no --loadall or --loadswap flag is specified
Showing
This diff is collapsed.
Please
register
or
sign in
to comment