fix: wait for hijacked connections to close during drain#16625
fix: wait for hijacked connections to close during drain#16625immanuwell wants to merge 2 commits into
Conversation
|
Hi @immanuwell. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: immanuwell The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16625 +/- ##
==========================================
- Coverage 80.36% 80.34% -0.03%
==========================================
Files 217 217
Lines 13568 13592 +24
==========================================
+ Hits 10904 10920 +16
- Misses 2301 2307 +6
- Partials 363 365 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a95113d to
751dd09
Compare
Are any of our handlers in serving returning after hijack? I would be surprised if they were - that's why this change seems unnecessary to me |
kunalworldwide
left a comment
There was a problem hiding this comment.
The approach is solid — wrapping the ResponseWriter to intercept Hijack() and tracking the returned net.Conn lifetime via trackedConn.Close() is the right pattern. Without this, the drainer's inflight count drops to zero as soon as the handler returns after hijacking, even though the websocket connection is still active.
A few observations:
-
Double-close protection:
closed.CompareAndSwap(false, true)correctly prevents double-decrementing the inflight counter ifClose()is called multiple times. Good. -
Flush assertion:
w.ResponseWriter.(http.Flusher).Flush()will panic if the wrapped writer doesn't implementFlusher. This is fine for the queue-proxy's use case (the underlying writer is always a*http.responsewhich implementsFlusher), but a nil-check would make the wrapper safer for reuse. -
Inflight double-counting: After
Hijack(), the handler'sdefer s.inflight.Add(-1)fires AND the hijacked conn holds its own +1. So during the window between handler return and conn close, inflight is still >= 1 from the conn tracker, which is the desired behavior. The handler's original +1/-1 pair cancels out cleanly. Logic checks out. -
Test coverage: The test creates a real hijackable writer and verifies the drain blocks until the conn is closed. Good regression test for the original bug.
LGTM.
Fixes #
Proposed Changes
net.ConnclosesFlush()andUnwrap()on the tracker wrapperHijack()path, not just a blocked handlerRepro:
TERMto the queue-proxy while that socket is still open.Hijack(), soHijackedDrainer.Drain()may see zero inflight too early and exit. The websocket gets cut off early, which is kinda rough.Related: follow-up to #16362.
Release Note