-
Stefy Lanza (nextime / spora ) authored
An engine mid-generation is GIL-blocked and fails the health poll, so it reads as unhealthy. pick_engine required e.healthy at every step, so a second request for a model pinned to that engine fell through to the least-loaded engine — which loaded a DUPLICATE copy (and ignored the model's configured n_ctx, e.g. 2048 vs 32000 → "exceeds context window"). Honour the pin (and the assigned owner) when the engine is alive but transiently busy: route there so the request queues on its gen-lock and the owner handles serialization/eviction. Only fall back to another engine when the owner's process is actually dead. Adds Engine.is_alive() (process liveness) and registry.engine_owning() (health-agnostic owner lookup). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
caa051b4