codai · ade800f95f5465ebb15914937bcb16b87f770632 · nexlab / coderai

fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history · ade800f9

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

- api_model_load: load a GGUF/text model via llama.cpp even when it's also
  bucketed under image/vision (respect the entry's primary model_type), so a
  gemma+mmproj LLM never hits the diffusers from_pretrained() path.
- model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
  capability and is kept out of the diffusers vision_models/image_models buckets.
- VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
  quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
  over-estimated into needless CPU offload.
- Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
  delete idempotent (repo already gone = success).
- Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
  tok/s. Throughput computed centrally in the task registry (live EMA + run
  average on finish). New "Recent tasks (last 10)" history section.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ade800f9

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
frontproxy		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...