-
Stefy Lanza (nextime / spora ) authored
ds4-server streams MoE experts and wants the whole GPU for its expert cache, but coderai's modest VRAM estimate for a ds4 model let it co-reside another model — starving the cache so ds4's layer-0 FFN expert encode failed ("gpu layer 0 ffn batch encode failed"). When loading a ds4 model on demand, unload all other models first so ds4-server gets the full GPU. Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
00e21ea5