Use llama.cpp's create_chat_completion for proper chat template handling (7947fb75) · Commits · nexlab / coderai

Commit 7947fb75 authored Feb 28, 2026 by

Stefy Lanza (nextime / spora )

Use llama.cpp's create_chat_completion for proper chat template handling

- Add generate_chat() and generate_chat_stream() methods to VulkanBackend
- These use create_chat_completion() which properly applies model's chat template
- Fallback to manual formatting if create_chat_completion fails
- Update API endpoints to pass messages dict directly instead of formatted prompt
- Fixes garbled output with Qwen3 and other models that use custom chat templates

parent eea67af6

Expand all Hide whitespace changes

Inline Side-by-side

View file @ 7947fb75

This diff is collapsed.

Please register or to comment