• Stefy Lanza (nextime / spora )'s avatar
    front: drain in-flight requests before bouncing an engine · 34d666d6
    Stefy Lanza (nextime / spora ) authored
    An engine restart (admin button / config change) previously SIGTERM'd the
    process immediately, severing any active SSE stream mid-response — the client
    saw httpcore.RemoteProtocolError "peer closed connection without sending
    complete message body".
    
    Now restart_engine marks the engine `draining` first: the router stops routing
    NEW requests to it (Engine.is_alive() reports false while draining, and the poll
    loop can't flip it back healthy), and the supervisor waits up to
    server.engine_restart_drain_grace seconds (default 30, 0 = immediate) for the
    in-flight count to reach zero before killing the process. Stragglers past the
    grace window are still bounced.
    
    In-flight is tracked per engine in the front proxy: proxy() increments on send
    and decrements once the streamed response is fully drained (or the send failed).
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    34d666d6
Name
Last commit
Last update
..
__init__.py Loading commit data...
app.py Loading commit data...
assignment.py Loading commit data...
engine_supervisor.py Loading commit data...
gpu_detect.py Loading commit data...
registry.py Loading commit data...
router.py Loading commit data...