• Stefy Lanza (nextime / spora )'s avatar
    vulkan: fold system role by template signal, not just architecture · 64eb74b7
    Stefy Lanza (nextime / spora ) authored
    Whether a model rejects the 'system' role is a property of the chat
    template baked into the specific GGUF, not the architecture: the gemma-2
    template and the official gemma template raise "System role not
    supported", while 'heretic' gemma4 quant conversions ship a permissive
    template that accepts system. Detect from the embedded
    tokenizer.chat_template (raise_exception/"system role") and fold only
    when it actually rejects system; fall back to architecture (Gemma) when
    no template is readable. Avoids needlessly folding permissive Gemma
    models while still covering gemma-2-9b and strict non-Gemma templates.
    The runtime "System role not supported" retry remains as a safety net.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    64eb74b7
vulkan.py 75.5 KB