• Stefy Lanza (nextime / spora )'s avatar
    front: route to a pinned/owning engine that's busy, not a duplicate elsewhere · caa051b4
    Stefy Lanza (nextime / spora ) authored
    An engine mid-generation is GIL-blocked and fails the health poll, so it
    reads as unhealthy. pick_engine required e.healthy at every step, so a
    second request for a model pinned to that engine fell through to the
    least-loaded engine — which loaded a DUPLICATE copy (and ignored the
    model's configured n_ctx, e.g. 2048 vs 32000 → "exceeds context window").
    
    Honour the pin (and the assigned owner) when the engine is alive but
    transiently busy: route there so the request queues on its gen-lock and
    the owner handles serialization/eviction. Only fall back to another engine
    when the owner's process is actually dead. Adds Engine.is_alive() (process
    liveness) and registry.engine_owning() (health-agnostic owner lookup).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    caa051b4
Name
Last commit
Last update
..
admin Loading commit data...
api Loading commit data...
backends Loading commit data...
broker Loading commit data...
frontproxy Loading commit data...
models Loading commit data...
openai Loading commit data...
pydantic Loading commit data...
queue Loading commit data...
tasks Loading commit data...
__init__.py Loading commit data...
cli.py Loading commit data...
config.py Loading commit data...
main.py Loading commit data...
platform_paths.py Loading commit data...