quant: reject checkpoints whose weights weren't actually quantized
GPTQModel silently leaves layers it can't map (e.g. gemma-4's fused batched MoE
experts) in bf16, producing a near-full-size "checkpoint" that the loader would
redirect to and then offload. The worker now scans the saved safetensors and, if
<50% of large weight bytes are int-packed, deletes the output and marks the job
failed (so it falls back to bitsandbytes) instead of reporting "done".
Co-Authored-By:
Claude Opus 4.8 <noreply@anthropic.com>
Showing
Please
register
or
sign in
to comment