-
Stefy Lanza (nextime / spora ) authored
- LoRA trainer: cache the SD/SDXL base on CPU between jobs so back-to-back trainings against the same base skip the disk reload, and the base holds no VRAM between jobs (moved to GPU only while training). Fixes the post-training eviction failure that forced the next image request into CPU/disk offload. - Model manager: add register_external_vram_releaser() + last-resort eviction pass so a generation can reclaim the trainer's cached base when needed (skips while a job runs). - Thermal: average 3 CPU samples spread across a 3s budget for the resume/ cooldown decision (CPU sensors swing +/-10C); pause stays single-read to react fast. Bounded so it never blocks past 3s of the poll interval. - Debug flags: --debug-web (uvicorn access lines), --debug-thermal ([thermal] [debug] checks), --debug-lora (per-step training loss to terminal); all off by default and independent of --debug. - Admin: lora_train_base_model field on the Models page; saves apply live to the running server (build_runtime_kwargs/apply_model_entry_live) with no restart. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
8d1136c4