codai/config.py · ef106ba16d01884ae65e503144acd77f97b0f720 · nexlab / coderai

fix(ds4+config): resolve bare model ids, don't over-estimate VRAM, robust config · 8c85e16a

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

- ds4: resolve a bare/aliased model id (e.g. "Foo-ds4-Q2_K", no path/extension) to
  its configured .gguf via a config/cache-aware resolver — fixes the 503 ("no local
  deepseek4 GGUF resolved") on chat requests (only "Load now" with a full path
  worked before). Ds4Backend reuses the same resolver.
- ds4: report a modest VRAM footprint for ds4 models (measured or ~12GB) instead of
  the 100GB+ GGUF size — ds4-server streams experts from SSD and manages its own
  memory, so the old estimate forced needless ~128GB eviction churn every request.
- ds4: route on-disk KV checkpoints into coderai's offload directory by default
  (--kv-disk-dir <offload>/ds4-kv) unless overridden in extra_args.
- config: tolerant load (_dc drops unknown keys) so a stale/newer config.json never
  crashes the whole load and silently resets ALL settings to defaults (the "had to
  reconfigure everything" bug). save_config + GET/POST settings carry the new ds4
  fields (model_path, auto_download, ssd_streaming).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

8c85e16a

config.py 30.7 KB

Replace config.py