• Your Name's avatar
    feat: Pipeline fixes, regex optimization, GBNF grammar support, and prompt distillation · 5341ee6a
    Your Name authored
    - Fixed streaming mode pipeline issues:
      - Fixed n-gram counting to handle partial matches correctly
      - Added per-chunk filtering to prevent duplicate n-grams across chunks
    
    - Optimized regex patterns (~35 patterns pre-compiled):
      - Pre-compiled all regex patterns for better performance
      - Added false positive protection with length-based filtering
      - Optimized tool call parsing in parser.py
    
    - Added grammar-guided generation (--ggg / --grammar-guided-gen):
      - New GBNF grammar file (tool_call_grammar.gbnf) for tool call parsing
      - Grammar loading utilities in models/grammar.py
      - Vulkan backend: Added GBNF grammar support via llama_generate_grammar
      - CUDA backend: Added outlines support for structured output
    
    - Added prompt distillation (--tools-closer-prompt):
      - New CLI option --tools-closer-prompt for prompt distillation
      - Enables generating distilled tool descriptions for better accuracy
    5341ee6a
grammar.py 1.65 KB