-
Stefy Lanza (nextime / spora ) authored
- api_model_load: load a GGUF/text model via llama.cpp even when it's also bucketed under image/vision (respect the entry's primary model_type), so a gemma+mmproj LLM never hits the diffusers from_pretrained() path. - model config save: a GGUF LLM with an mmproj auto-gets the image_to_text capability and is kept out of the diffusers vision_models/image_models buckets. - VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't over-estimated into needless CPU offload. - Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the delete idempotent (repo already gone = success). - Tasks page: generation tasks now report it/s (or s/it when slow); text keeps tok/s. Throughput computed centrally in the task registry (live EMA + run average on finish). New "Recent tasks (last 10)" history section. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
ade800f9