1. 16 Mar, 2026 20 commits
  2. 15 Mar, 2026 20 commits
    • Your Name's avatar
      Make --hf-chat-template repeatable per model · 31b6480e
      Your Name authored
      - Changed --hf-chat-template from boolean to action=append
      - Added check_hf_chat_template() function for model-specific checking
      - Updated _finalize_chat_template_detection to use new function
      - Updated README with new syntax
      31b6480e
    • Your Name's avatar
      Add --hf-chat-template option for HuggingFace apply_chat_template · 3121fb85
      Your Name authored
      - Added --hf-chat-template CLI flag to use transformers apply_chat_template
      - Added _load_huggingface_tokenizer() to load HF tokenizer for GGUF models
      - Added _format_messages_hf() method for HF chat template formatting
      - Updated generate_chat and generate_chat_stream to use HF tokenizer when available
      - Updated format_messages to check for HF tokenizer first
      - Added documentation in README.md
      3121fb85
    • Your Name's avatar
      Add --reply-filters option for optional content filtering · 533f8fd5
      Your Name authored
      - Added --reply-filters CLI flag to make content filtering optional
      - Supports comma-separated values: --reply-filters malformed,tool_calls
      - Supports model-specific filters: --reply-filters text:malformed --reply-filters image:tool_calls
      - Supports specific model names: --reply-filters text:llama-3.1:malformed
      - Added check_reply_filter() and check_single_filter() helper functions
      - Updated stream_chat_response and generate_chat_response to use new filtering
      - Updated ToolCallParser._filter_malformed_content for conditional filtering
      - Added documentation in README.md
      533f8fd5
    • Your Name's avatar
      Increase VRAM cleanup delay to 3 seconds · f8d2481e
      Your Name authored
      - Give more time for Vulkan memory to be freed after unloading image models
      f8d2481e
    • Your Name's avatar
      Fix: Force VRAM cleanup when switching from image to text model · a4a8c340
      Your Name authored
      - Add garbage collection and torch.cuda.empty_cache() after unloading image models
      - Add a small delay to allow VRAM to be freed before loading new model
      - This should help prevent OOM errors when switching between image and text models
      a4a8c340
    • Your Name's avatar
      Fix: Don't strip whitespace from model output · 13b56ea0
      Your Name authored
      - Remove stripping in strip_tool_calls_from_content function
      - Whitespace (spaces, newlines) are valid content and should be preserved
      13b56ea0
    • Your Name's avatar
      Fix: Use cached file path when reloading URL-based models · 7e7930f5
      Your Name authored
      - When reloading a default model that was loaded from a URL,
        check for cached file path and use it instead of the URL
      7e7930f5
    • Your Name's avatar
      Fix: Handle NaN values in diffusers image output · dd924c1a
      Your Name authored
      - Replace NaN and Inf values with valid values before saving
      - Clip image values to valid range [0, 1] to prevent black images
      dd924c1a
    • Your Name's avatar
      Fix: Reload default text model when switching from image to text · c517c947
      Your Name authored
      - When 'default' model is requested but not loaded (was unloaded for image model),
        the code now tries to reload the default model
      - Cleanup image models first to free VRAM, then reload the text model
      c517c947
    • Your Name's avatar
      Add --image-cpu-offload option and fix sequential offload logic · c8f7c8d9
      Your Name authored
      - Add --image-cpu-offload CLI flag for explicit sequential CPU offload
      - Enable sequential CPU offload only on 3rd OOM retry or when --image-cpu-offload is set
      c8f7c8d9
    • Your Name's avatar
      Fix: Filter URLs from default model listing · ac005426
      Your Name authored
      - Skip URLs when listing the default model in list_models()
      - This prevents download URLs from appearing in available models list
      ac005426
    • Your Name's avatar
      Add OOM handling and sequential offload for diffusers · 096b75d2
      Your Name authored
      - Enable sequential CPU offload if --offload-strategy or --offload-dir is specified
      - Add retry logic: on OOM, retry with attention_slicing, then with sequential_offload
      - Clear CUDA cache between retry attempts
      096b75d2
    • Your Name's avatar
      Add --image-precision option and VAE tiling support for diffusers · 782612ea
      Your Name authored
      - Add --image-precision with choices: bf16, f32, f16, f8
      - bf16 recommended for modern GPUs (RTX 30/40 series) to avoid NaN issues
      - Enable VAE tiling for diffusers when --vae-tiling is specified
      782612ea
    • Your Name's avatar
      Fix diffusers NaN warning by using FP32 instead of FP16 · df8b4875
      Your Name authored
      - Changed torch_dtype from float16 (when CUDA available) to float32
      - This prevents NaN/Infinity values in image output that cause black/corrupted images
      - FP16 can cause numerical overflow on some models like SDXL
      df8b4875
    • Your Name's avatar
      Add steps and guidance_scale to image generation request · 55a39eeb
      Your Name authored
      - Add 'steps' parameter to ImageGenerationRequest (overrides quality-based default)
      - Add 'guidance_scale' parameter to ImageGenerationRequest (overrides CLI --image-cfg-scale)
      - Use request values in diffusers pipeline call
      55a39eeb
    • Your Name's avatar
      Fix diffusers time variable scoping issue · 9a749ea4
      Your Name authored
      - Import time module inside try block with alias to avoid UnboundLocalError
      - This prevents Python's exception handling from affecting variable scope
      9a749ea4
    • Your Name's avatar
      Fix model listing: remove duplicate 'image', remove vision: alias, filter URLs · 9f01de41
      Your Name authored
      - Remove duplicate 'image' entry in list_models()
      - Remove vision: alias (user doesn't want it)
      - Skip URLs in loaded models listing (they're download sources)
      - Add full traceback to diffusers error for debugging
      9f01de41
    • Your Name's avatar
      Fix cache listing to include HuggingFace subdirectories · 57a7951b
      Your Name authored
      - Recursively scan huggingface cache directory (hub/, xet/, etc.)
      - Also fix remove-model to search recursively in huggingface cache
      57a7951b
    • Your Name's avatar
      Add multi-cache support for cached model commands · f606b9a7
      Your Name authored
      - Add get_all_cache_dirs() to find GGUF, HuggingFace, and Diffusers caches
      - Update --list-cached-models to show all cache locations
      - Update --remove-all-models to clean all cache directories
      - Update --remove-model to search across all caches
      - Add better error handling for diffusers image extraction
      f606b9a7
    • Your Name's avatar
      Fix auto URL to use server host from request headers · ab253a98
      Your Name authored
      - When --file-path is set and --url is 'auto', use the Host header
        from the request (what the client used to connect) instead of
        the client's IP address
      - This ensures the returned URL points to the correct server
      ab253a98