Add task management, quantization, and hardware telemetry
Tasks / queue management:
- Central in-memory task registry with cooperative cancel, pause/resume,
and step progress across image/video/audio/text generation + LoRA training
- Tasks admin page (live 2s poll): cancel, interrupt, pause/resume, restart,
remove; done jobs auto-drop from the list; bounded persisted job history
- Disable interrupted-training recovery via --no-resume-jobs + settings toggle
Quantization / acceleration:
- TurboQuant embedding vector quantization (data-free, inner-product
preserving): built-in NumPy backend + optional turboquant-py library,
selectable per embedding model; /v1/embeddings `quantization` param
- llama.cpp KV-cache quantization (cache_type_k/v) for GGUF text models,
configurable in the Models UI
Hardware telemetry:
- Thermal cooldown state surfaced on the Tasks page (banner + per-task badge)
- Live CPU/GPU/RAM/VRAM usage + temperature panel via /admin/api/system-stats
Docs: API documentation gaps/accuracy pass + Swagger overhaul; DISTRIBUTION.md
implementation spec. Plus I2V LoRA training channel-mismatch fix.
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
This diff is collapsed.
DISTRIBUTION.md
0 → 100644
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
codai/models/turboquant.py
0 → 100644
This diff is collapsed.
This diff is collapsed.
codai/tasks/__init__.py
0 → 100644
codai/tasks/registry.py
0 → 100644
Please
register
or
sign in
to comment