Use llama.cpp's create_chat_completion for proper chat template handling

- Add generate_chat() and generate_chat_stream() methods to VulkanBackend
- These use create_chat_completion() which properly applies model's chat template
- Fallback to manual formatting if create_chat_completion fails
- Update API endpoints to pass messages dict directly instead of formatted prompt
- Fixes garbled output with Qwen3 and other models that use custom chat templates
parent eea67af6
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment