• Stefy Lanza (nextime / spora )'s avatar
    fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history · ade800f9
    Stefy Lanza (nextime / spora ) authored
    - api_model_load: load a GGUF/text model via llama.cpp even when it's also
      bucketed under image/vision (respect the entry's primary model_type), so a
      gemma+mmproj LLM never hits the diffusers from_pretrained() path.
    - model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
      capability and is kept out of the diffusers vision_models/image_models buckets.
    - VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
      quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
      over-estimated into needless CPU offload.
    - Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
      delete idempotent (repo already gone = success).
    - Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
      tok/s. Throughput computed centrally in the task registry (live EMA + run
      average on finish). New "Recent tasks (last 10)" history section.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    ade800f9
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
frontproxy Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...