codai/admin/routes.py · ade800f95f5465ebb15914937bcb16b87f770632 · nexlab / coderai

fix: GGUF vision/mmproj routing + VRAM estimate; Tasks page it/s + history · ade800f9

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

- api_model_load: load a GGUF/text model via llama.cpp even when it's also
  bucketed under image/vision (respect the entry's primary model_type), so a
  gemma+mmproj LLM never hits the diffusers from_pretrained() path.
- model config save: a GGUF LLM with an mmproj auto-gets the image_to_text
  capability and is kept out of the diffusers vision_models/image_models buckets.
- VRAM estimate: _runtime_reserve_gb scales the KV-cache reserve by the cache
  quantization (q4_0 ≈ 0.27× f16) so quantized-KV models at large context aren't
  over-estimated into needless CPU offload.
- Free disk (HF): quiet huggingface_hub's noisy not-found traceback and make the
  delete idempotent (repo already gone = success).
- Tasks page: generation tasks now report it/s (or s/it when slow); text keeps
  tok/s. Throughput computed centrally in the task registry (live EMA + run
  average on finish). New "Recent tasks (last 10)" history section.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ade800f9

routes.py 173 KB

Replace routes.py