Use llama.cpp's create_chat_completion for proper chat template handling
- Add generate_chat() and generate_chat_stream() methods to VulkanBackend - These use create_chat_completion() which properly applies model's chat template - Fallback to manual formatting if create_chat_completion fails - Update API endpoints to pass messages dict directly instead of formatted prompt - Fixes garbled output with Qwen3 and other models that use custom chat templates
Showing
This diff is collapsed.
Please
register
or
sign in
to comment