codai/pydantic · 8ad15128880b86474caf590aeda3a5b9e75d00fa · nexlab / coderai

Add task management, quantization, and hardware telemetry · 8ad15128

Stefy Lanza (nextime / spora ) authored Jun 11, 2026

Tasks / queue management:
- Central in-memory task registry with cooperative cancel, pause/resume,
  and step progress across image/video/audio/text generation + LoRA training
- Tasks admin page (live 2s poll): cancel, interrupt, pause/resume, restart,
  remove; done jobs auto-drop from the list; bounded persisted job history
- Disable interrupted-training recovery via --no-resume-jobs + settings toggle

Quantization / acceleration:
- TurboQuant embedding vector quantization (data-free, inner-product
  preserving): built-in NumPy backend + optional turboquant-py library,
  selectable per embedding model; /v1/embeddings `quantization` param
- llama.cpp KV-cache quantization (cache_type_k/v) for GGUF text models,
  configurable in the Models UI

Hardware telemetry:
- Thermal cooldown state surfaced on the Tasks page (banner + per-task badge)
- Live CPU/GPU/RAM/VRAM usage + temperature panel via /admin/api/system-stats

Docs: API documentation gaps/accuracy pass + Swagger overhaul; DISTRIBUTION.md
implementation spec. Plus I2V LoRA training channel-mismatch fix.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

8ad15128

Name	Last commit	Last update
..
audiogenrequest.py		Loading commit data...
embedrequest.py		Loading commit data...
imagerequest.py		Loading commit data...
textrequest.py		Loading commit data...
transcriptionrequest.py		Loading commit data...
videorequest.py		Loading commit data...