-
Your Name authored
- Fixed streaming mode pipeline issues: - Fixed n-gram counting to handle partial matches correctly - Added per-chunk filtering to prevent duplicate n-grams across chunks - Optimized regex patterns (~35 patterns pre-compiled): - Pre-compiled all regex patterns for better performance - Added false positive protection with length-based filtering - Optimized tool call parsing in parser.py - Added grammar-guided generation (--ggg / --grammar-guided-gen): - New GBNF grammar file (tool_call_grammar.gbnf) for tool call parsing - Grammar loading utilities in models/grammar.py - Vulkan backend: Added GBNF grammar support via llama_generate_grammar - CUDA backend: Added outlines support for structured output - Added prompt distillation (--tools-closer-prompt): - New CLI option --tools-closer-prompt for prompt distillation - Enables generating distilled tool descriptions for better accuracy
5341ee6a