• Stefy Lanza (nextime / spora )'s avatar
    Add task management, quantization, and hardware telemetry · 8ad15128
    Stefy Lanza (nextime / spora ) authored
    Tasks / queue management:
    - Central in-memory task registry with cooperative cancel, pause/resume,
      and step progress across image/video/audio/text generation + LoRA training
    - Tasks admin page (live 2s poll): cancel, interrupt, pause/resume, restart,
      remove; done jobs auto-drop from the list; bounded persisted job history
    - Disable interrupted-training recovery via --no-resume-jobs + settings toggle
    
    Quantization / acceleration:
    - TurboQuant embedding vector quantization (data-free, inner-product
      preserving): built-in NumPy backend + optional turboquant-py library,
      selectable per embedding model; /v1/embeddings `quantization` param
    - llama.cpp KV-cache quantization (cache_type_k/v) for GGUF text models,
      configurable in the Models UI
    
    Hardware telemetry:
    - Thermal cooldown state surfaced on the Tasks page (banner + per-task badge)
    - Live CPU/GPU/RAM/VRAM usage + temperature panel via /admin/api/system-stats
    
    Docs: API documentation gaps/accuracy pass + Swagger overhaul; DISTRIBUTION.md
    implementation spec. Plus I2V LoRA training channel-mismatch fix.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    8ad15128
CODERAI_API_DOCUMENTATION.md 63.6 KB