-
Stefy Lanza (nextime / spora ) authored
An engine mid-generation is GIL-blocked and fails the health poll, so it reads as unhealthy. pick_engine required e.healthy at every step, so a second request for a model pinned to that engine fell through to the least-loaded engine — which loaded a DUPLICATE copy (and ignored the model's configured n_ctx, e.g. 2048 vs 32000 → "exceeds context window"). Honour the pin (and the assigned owner) when the engine is alive but transiently busy: route there so the request queues on its gen-lock and the owner handles serialization/eviction. Only fall back to another engine when the owner's process is actually dead. Adds Engine.is_alive() (process liveness) and registry.engine_owning() (health-agnostic owner lookup). Co-Authored-By:Claude Opus 4.8 <noreply@anthropic.com>
caa051b4
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| __init__.py | ||
| app.py | ||
| assignment.py | ||
| engine_supervisor.py | ||
| gpu_detect.py | ||
| registry.py | ||
| router.py |