1. 16 Mar, 2026 35 commits
  2. 15 Mar, 2026 5 commits
    • Your Name's avatar
      Make --hf-chat-template repeatable per model · 31b6480e
      Your Name authored
      - Changed --hf-chat-template from boolean to action=append
      - Added check_hf_chat_template() function for model-specific checking
      - Updated _finalize_chat_template_detection to use new function
      - Updated README with new syntax
      31b6480e
    • Your Name's avatar
      Add --hf-chat-template option for HuggingFace apply_chat_template · 3121fb85
      Your Name authored
      - Added --hf-chat-template CLI flag to use transformers apply_chat_template
      - Added _load_huggingface_tokenizer() to load HF tokenizer for GGUF models
      - Added _format_messages_hf() method for HF chat template formatting
      - Updated generate_chat and generate_chat_stream to use HF tokenizer when available
      - Updated format_messages to check for HF tokenizer first
      - Added documentation in README.md
      3121fb85
    • Your Name's avatar
      Add --reply-filters option for optional content filtering · 533f8fd5
      Your Name authored
      - Added --reply-filters CLI flag to make content filtering optional
      - Supports comma-separated values: --reply-filters malformed,tool_calls
      - Supports model-specific filters: --reply-filters text:malformed --reply-filters image:tool_calls
      - Supports specific model names: --reply-filters text:llama-3.1:malformed
      - Added check_reply_filter() and check_single_filter() helper functions
      - Updated stream_chat_response and generate_chat_response to use new filtering
      - Updated ToolCallParser._filter_malformed_content for conditional filtering
      - Added documentation in README.md
      533f8fd5
    • Your Name's avatar
      Increase VRAM cleanup delay to 3 seconds · f8d2481e
      Your Name authored
      - Give more time for Vulkan memory to be freed after unloading image models
      f8d2481e
    • Your Name's avatar
      Fix: Force VRAM cleanup when switching from image to text model · a4a8c340
      Your Name authored
      - Add garbage collection and torch.cuda.empty_cache() after unloading image models
      - Add a small delay to allow VRAM to be freed before loading new model
      - This should help prevent OOM errors when switching between image and text models
      a4a8c340