1. 16 Mar, 2026 32 commits
  2. 15 Mar, 2026 8 commits
    • Your Name's avatar
      Make --hf-chat-template repeatable per model · 31b6480e
      Your Name authored
      - Changed --hf-chat-template from boolean to action=append
      - Added check_hf_chat_template() function for model-specific checking
      - Updated _finalize_chat_template_detection to use new function
      - Updated README with new syntax
      31b6480e
    • Your Name's avatar
      Add --hf-chat-template option for HuggingFace apply_chat_template · 3121fb85
      Your Name authored
      - Added --hf-chat-template CLI flag to use transformers apply_chat_template
      - Added _load_huggingface_tokenizer() to load HF tokenizer for GGUF models
      - Added _format_messages_hf() method for HF chat template formatting
      - Updated generate_chat and generate_chat_stream to use HF tokenizer when available
      - Updated format_messages to check for HF tokenizer first
      - Added documentation in README.md
      3121fb85
    • Your Name's avatar
      Add --reply-filters option for optional content filtering · 533f8fd5
      Your Name authored
      - Added --reply-filters CLI flag to make content filtering optional
      - Supports comma-separated values: --reply-filters malformed,tool_calls
      - Supports model-specific filters: --reply-filters text:malformed --reply-filters image:tool_calls
      - Supports specific model names: --reply-filters text:llama-3.1:malformed
      - Added check_reply_filter() and check_single_filter() helper functions
      - Updated stream_chat_response and generate_chat_response to use new filtering
      - Updated ToolCallParser._filter_malformed_content for conditional filtering
      - Added documentation in README.md
      533f8fd5
    • Your Name's avatar
      Increase VRAM cleanup delay to 3 seconds · f8d2481e
      Your Name authored
      - Give more time for Vulkan memory to be freed after unloading image models
      f8d2481e
    • Your Name's avatar
      Fix: Force VRAM cleanup when switching from image to text model · a4a8c340
      Your Name authored
      - Add garbage collection and torch.cuda.empty_cache() after unloading image models
      - Add a small delay to allow VRAM to be freed before loading new model
      - This should help prevent OOM errors when switching between image and text models
      a4a8c340
    • Your Name's avatar
      Fix: Don't strip whitespace from model output · 13b56ea0
      Your Name authored
      - Remove stripping in strip_tool_calls_from_content function
      - Whitespace (spaces, newlines) are valid content and should be preserved
      13b56ea0
    • Your Name's avatar
      Fix: Use cached file path when reloading URL-based models · 7e7930f5
      Your Name authored
      - When reloading a default model that was loaded from a URL,
        check for cached file path and use it instead of the URL
      7e7930f5
    • Your Name's avatar
      Fix: Handle NaN values in diffusers image output · dd924c1a
      Your Name authored
      - Replace NaN and Inf values with valid values before saving
      - Clip image values to valid range [0, 1] to prevent black images
      dd924c1a