• Stefy Lanza (nextime / spora )'s avatar
    front: route to a pinned/owning engine that's busy, not a duplicate elsewhere · caa051b4
    Stefy Lanza (nextime / spora ) authored
    An engine mid-generation is GIL-blocked and fails the health poll, so it
    reads as unhealthy. pick_engine required e.healthy at every step, so a
    second request for a model pinned to that engine fell through to the
    least-loaded engine — which loaded a DUPLICATE copy (and ignored the
    model's configured n_ctx, e.g. 2048 vs 32000 → "exceeds context window").
    
    Honour the pin (and the assigned owner) when the engine is alive but
    transiently busy: route there so the request queues on its gen-lock and
    the owner handles serialization/eviction. Only fall back to another engine
    when the owner's process is actually dead. Adds Engine.is_alive() (process
    liveness) and registry.engine_owning() (health-agnostic owner lookup).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    caa051b4
router.py 6.77 KB