-
Stefy Lanza (nextime / spora ) authored
- ds4: resolve a bare/aliased model id (e.g. "Foo-ds4-Q2_K", no path/extension) to its configured .gguf via a config/cache-aware resolver — fixes the 503 ("no local deepseek4 GGUF resolved") on chat requests (only "Load now" with a full path worked before). Ds4Backend reuses the same resolver. - ds4: report a modest VRAM footprint for ds4 models (measured or ~12GB) instead of the 100GB+ GGUF size — ds4-server streams experts from SSD and manages its own memory, so the old estimate forced needless ~128GB eviction churn every request. - ds4: route on-disk KV checkpoints into coderai's offload directory by default (--kv-disk-dir <offload>/ds4-kv) unless overridden in extra_args. - config: tolerant load (_dc drops unknown keys) so a stale/newer config.json never crashes the whole load and silently resets ALL settings to defaults (the "had to reconfigure everything" bug). save_config + GET/POST settings carry the new ds4 fields (model_path, auto_download, ssd_streaming). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
8c85e16a