1. 10 Mar, 2026 10 commits
    • Your Name's avatar
      Fix: Remove redundant import os statements causing UnboundLocalError · 0609c4cf
      Your Name authored
      - Removed redundant 'import os' statements inside functions (lines 4522, 4926, 5005)
      - Added back missing 'from llama_cpp import Llama' that was accidentally deleted
      - Global 'import os' at line 12 is now the only one
      
      This fixes the UnboundLocalError when running --list-cached-models or other CLI options.
      0609c4cf
    • Your Name's avatar
      Add cache management CLI options · 496c4e53
      Your Name authored
      - --list-cached-models: List all cached models with sizes
      - --remove-all-models: Remove all cached models
      - --remove-model <modelid>: Remove specific model by name/hash (partial match)
      496c4e53
    • Your Name's avatar
      Add GGUF magic bytes validation · 015c6908
      Your Name authored
      - Check if downloaded file is valid GGUF (magic bytes = 'GGUF')
      - If not valid, show clear error that URL is wrong (returns HTML instead)
      - Explain that URL must be direct download link ending in .gguf
      015c6908
    • Your Name's avatar
      611bfd8f
    • Your Name's avatar
      Add verbose error handling for GGUF image model loading · 141329bc
      Your Name authored
      - Enable verbose=True in llama.cpp to see actual error
      - Print GGUF model file size for debugging
      - Add try/except with traceback to see detailed errors
      141329bc
    • Your Name's avatar
      Improve GGUF image model loading - better URL handling · 9af89755
      Your Name authored
      - Check if model is URL before any processing
      - Use original model name with query params for URL download
      - Strip query params only for HuggingFace repo ID parsing
      - Added more debug output to trace issues
      9af89755
    • Your Name's avatar
      Fix GGUF image model loading - strip query parameters · 6bc9af36
      Your Name authored
      - Strip query parameters from model name before processing
      - Handle URLs with ?download=true or other query params
      6bc9af36
    • Your Name's avatar
      Add GGUF image model support in --loadall mode · e848dd47
      Your Name authored
      - Detect if image model is GGUF (ends with .gguf or contains 'gguf')
      - If GGUF, load using llama.cpp (same as text Vulkan models)
      - If diffusers model, load using Stable Diffusion pipeline
      - Fixed both locations where image model preloading happens
      - Now supports both GGUF and diffusers image generation models
      e848dd47
    • Your Name's avatar
      Fix image model preloading with --loadall flag · 2308d5b0
      Your Name authored
      - Fixed bug where image model wasn't actually being loaded when --loadall was specified
      - The code only printed messages but never loaded the diffusers pipeline
      - Now actually loads the Stable Diffusion pipeline using diffusers library
      - Tries StableDiffusionXLPipeline first, falls back to generic DiffusionPipeline
      - Moves to GPU if CUDA available, enables attention slicing for memory efficiency
      - Also fixes second location where image model is the only configured model
      
      - Debug command line output was already implemented
      2308d5b0
    • Your Name's avatar
      Fix --loadall model preloading and --debug command line output · 9193536a
      Your Name authored
      - Fixed undefined variable bug where model_name wasn't defined in scope
      - Fixed duplicate model loading when using --loadall/--loadswap with multiple models
      - First model is now only loaded once (skipped in loop if already loaded)
      - Loadall mode now properly preloads all models in VRAM respecting offload strategy
      - Loadswap mode properly loads additional models to RAM
      - Ondemand mode doesn't reload first model
      
      Feature 1: --debug now shows full command line as first output
      Feature 2: --loadall with multiple models now preloads all in VRAM
      9193536a
  2. 09 Mar, 2026 30 commits