• Stefy Lanza (nextime / spora )'s avatar
    fix(ds4+config): resolve bare model ids, don't over-estimate VRAM, robust config · 8c85e16a
    Stefy Lanza (nextime / spora ) authored
    - ds4: resolve a bare/aliased model id (e.g. "Foo-ds4-Q2_K", no path/extension) to
      its configured .gguf via a config/cache-aware resolver — fixes the 503 ("no local
      deepseek4 GGUF resolved") on chat requests (only "Load now" with a full path
      worked before). Ds4Backend reuses the same resolver.
    - ds4: report a modest VRAM footprint for ds4 models (measured or ~12GB) instead of
      the 100GB+ GGUF size — ds4-server streams experts from SSD and manages its own
      memory, so the old estimate forced needless ~128GB eviction churn every request.
    - ds4: route on-disk KV checkpoints into coderai's offload directory by default
      (--kv-disk-dir <offload>/ds4-kv) unless overridden in extra_args.
    - config: tolerant load (_dc drops unknown keys) so a stale/newer config.json never
      crashes the whole load and silently resets ALL settings to defaults (the "had to
      reconfigure everything" bug). save_config + GET/POST settings carry the new ds4
      fields (model_path, auto_download, ssd_streaming).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    8c85e16a
config.py 30.7 KB