feat: Add multi-model support for audio transcription and image generation

- Add --audio-model and --image-model CLI arguments
- Add --loadall, --audio-ctx, --audio-offload, --vision-ctx, --vision-offload args
- Implement MultiModelManager class for dynamic model switching
- Add POST /v1/audio/transcriptions endpoint (OpenAI-compatible)
- Add POST /v1/images/generations endpoint (OpenAI-compatible)
- Update endpoints to use multi_model_manager for model selection
- Audio uses faster-whisper for local transcription
- Images use Stable Diffusion via diffusers
parent eb6b8d85
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment