• Stefy Lanza (nextime / spora )'s avatar
    fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history · ade800f9
    Stefy Lanza (nextime / spora ) authored
    - api_model_load: load a GGUF/text model via llama.cpp even when it's also
      bucketed under image/vision (respect the entry's primary model_type), so a
      gemma+mmproj LLM never hits the diffusers from_pretrained() path.
    - model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
      capability and is kept out of the diffusers vision_models/image_models buckets.
    - VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
      quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
      over-estimated into needless CPU offload.
    - Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
      delete idempotent (repo already gone = success).
    - Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
      tok/s. Throughput computed centrally in the task registry (live EMA + run
      average on finish). New "Recent tasks (last 10)" history section.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    ade800f9
tasks.html 20.4 KB