• Stefy Lanza (nextime / spora )'s avatar
    Wan LoRA: cache transformer across jobs + smoother progress · d1fc17e0
    Stefy Lanza (nextime / spora ) authored
    Cache the Wan transformer expert(s) between consecutive trainings against the
    same base (keyed by base_path+quantize) so a back-to-back job skips the very slow
    reload (tens of minutes for A14B). Only this job's adapter + gradient-checkpoint
    hooks are removed at teardown; the base transformer(s) stay resident. Since 4-bit
    weights can't move to CPU, they hold GPU VRAM between jobs — so the external VRAM
    releaser now drops the Wan cache too when a generation needs the GPU, and the
    error path clears both caches.
    
    Also report training progress every step (cheap dict update) instead of every
    10, so the web UI bar advances smoothly once steps begin.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    d1fc17e0
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...