codai · 00e21ea567775b70b5663910ee0a86a907293ec9 · nexlab / coderai

fix(ds4): give ds4 models exclusive VRAM (evict others) to stop expert-cache starvation · 00e21ea5

Stefy Lanza (nextime / spora ) authored Jun 19, 2026

ds4-server streams MoE experts and wants the whole GPU for its expert cache, but
coderai's modest VRAM estimate for a ds4 model let it co-reside another model —
starving the cache so ds4's layer-0 FFN expert encode failed ("gpu layer 0 ffn
batch encode failed"). When loading a ds4 model on demand, unload all other models
first so ds4-server gets the full GPU.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

00e21ea5

Name	Last commit	Last update
..
admin		Loading commit data...
api		Loading commit data...
backends		Loading commit data...
broker		Loading commit data...
frontproxy		Loading commit data...
models		Loading commit data...
openai		Loading commit data...
pydantic		Loading commit data...
queue		Loading commit data...
tasks		Loading commit data...
__init__.py		Loading commit data...
cli.py		Loading commit data...
config.py		Loading commit data...
main.py		Loading commit data...
platform_paths.py		Loading commit data...