fix: Properly shutdown quickwit-serve when subcomponents panic or otherwise error.#6196
Open
philip-wernersbach wants to merge 1 commit intoquickwit-oss:mainfrom
Open
fix: Properly shutdown quickwit-serve when subcomponents panic or otherwise error.#6196philip-wernersbach wants to merge 1 commit intoquickwit-oss:mainfrom
philip-wernersbach wants to merge 1 commit intoquickwit-oss:mainfrom
Conversation
904ac62 to
c46f2a5
Compare
…erwise error. Before this change, the `if let Err` block silently swallows the error and logs it. The code continues on to the `shutdown_handle.await` call. In the case where the `tokio::try_join!` returns an error (such as when any of the three components for the three `JoinHandle` arguments panic), the `shutdown_handle` is not guaranteed to have completed, so the program sits there waiting for a SIGTERM, even though some components aren’t running.
c46f2a5 to
f3e5292
Compare
Author
|
Added results from a test in prod. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Before this change, the
if let Errblock silently swallows the error and logs it. The code continues on to theshutdown_handle.awaitcall. In the case where thetokio::try_join!returns an error (such as when any of the three components for the threeJoinHandlearguments panic), theshutdown_handleis not guaranteed to have completed, so the program sits there waiting for a SIGTERM, even though some components aren’t running.Context
We are seeing the following
chitchatpanic in prod, in ourmetastorepods. After the panic message is printed, theERROR quickwit_serve: server failed: Chitchat server panickedmessage is printed, and the program waits to for a SIGTERM. No further log messages are printed until the SIGTERM occurs. Meanwhile, ourquickwit-indexerpods do not work, because thechitchatwithmetastoreis broken.How was this PR tested?
Built a custom Docker image with this fix, tested in prod:
chitchatpanic causes pod to shut down: