Skip to content

fix(buffer): Reject envelope when buffer is down#5746

Open
jjbayer wants to merge 11 commits intomasterfrom
fix/queue-failed
Open

fix(buffer): Reject envelope when buffer is down#5746
jjbayer wants to merge 11 commits intomasterfrom
fix/queue-failed

Conversation

@jjbayer
Copy link
Member

@jjbayer jjbayer commented Mar 19, 2026

The new test in test_shutdown.py reproduces the following failure mode:

  1. Relay receives the SIGTERM signal. In-flight requests exist in the request handler.
  2. EnvelopeBufferService receives the shutdown signal.
  3. EnvelopeBufferService writes all pending envelopes to disk and breaks out of its service loop (see EnvelopeBufferService::handle_shutdown).
  4. The request handler calls ObservableEnvelopeBuffer::try_push with the in-flight envelopes.
  5. self.addr.send fails because the receiving end of the channel is gone.
  6. The envelope is dropped. We have seen stack traces for this.

There should only be one such in-flight request per connection, but we have quite a high number of connections open on each Relay pod.

With this PR, the server responds with an explicit 503 during shutdown, so the request can be retried on a different instance.

In a follow-up PR, I want to change the flushing behavior to optionally not save the internal state to disk, because it makes no sense for ephemeral disks.

Fixes: INGEST-510

@linear-code
Copy link

linear-code bot commented Mar 19, 2026

jjbayer and others added 5 commits March 20, 2026 07:50
Verifies that relay completes in-flight HTTP requests before shutting
down on SIGTERM, and rejects new connections once the listener closes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +146 to +149
relay_log::warn!(
error = &self as &dyn std::error::Error,
"not handling request: service unavailable"
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgraded from error to warning here because events are not necessarily lost, as long as the client retries 503s. We will still see the warning in Sentry.

@jjbayer jjbayer marked this pull request as ready for review March 20, 2026 08:25
@jjbayer jjbayer requested a review from a team as a code owner March 20, 2026 08:25
@jjbayer jjbayer enabled auto-merge March 23, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants