vulkan: fold system role by template signal, not just architecture
Whether a model rejects the 'system' role is a property of the chat
template baked into the specific GGUF, not the architecture: the gemma-2
template and the official gemma template raise "System role not
supported", while 'heretic' gemma4 quant conversions ship a permissive
template that accepts system. Detect from the embedded
tokenizer.chat_template (raise_exception/"system role") and fold only
when it actually rejects system; fall back to architecture (Gemma) when
no template is readable. Avoids needlessly folding permissive Gemma
models while still covering gemma-2-9b and strict non-Gemma templates.
The runtime "System role not supported" retry remains as a safety net.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment