fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history
- api_model_load: load a GGUF/text model via llama.cpp even when it's also
bucketed under image/vision (respect the entry's primary model_type), so a
gemma+mmproj LLM never hits the diffusers from_pretrained() path.
- model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
capability and is kept out of the diffusers vision_models/image_models buckets.
- VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
over-estimated into needless CPU offload.
- Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
delete idempotent (repo already gone = success).
- Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
tok/s. Throughput computed centrally in the task registry (live EMA + run
average on finish). New "Recent tasks (last 10)" history section.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment