fix: set connection401Error before invoking resultCallback in onError#332
fix: set connection401Error before invoking resultCallback in onError#332kinyoklion wants to merge 6 commits intomainfrom
Conversation
…art test The startWithHttp401PreventsSubsequentStart test is flaky due to a race condition. In StreamingDataSource.onError(), the production code calls resultCallback.onError() (which unblocks the test's callback.awaitError()) before setting connection401Error and calling dataSourceUpdateSink.shutDown(). The test thread can resume and call the second sds.start() before connection401Error is set, causing the second start to proceed and produce a callback. This adds a short polling loop (up to 1 second, checking every 10ms) to wait for shutDownCalled to become true before testing the second start, matching the pattern used in the startWithHttp401ShutsDownSink fix. Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
| while (!dataSourceUpdateSink.shutDownCalled && System.currentTimeMillis() < deadline) { | ||
| Thread.sleep(10); | ||
| } | ||
| assertTrue(dataSourceUpdateSink.shutDownCalled); |
There was a problem hiding this comment.
Is dataSourceUpdateSink.shutDownCalled thread-safe?
There was a problem hiding this comment.
Yes — it's declared as volatile in MockComponents.java line 141:
public volatile boolean shutDownCalled = false;The volatile guarantees visibility across threads, so the polling loop in the test thread will see the write from the background thread without any additional synchronization. This is the same field and pattern used by the sibling test startWithHttp401ShutsDownSink (from #329).
There was a problem hiding this comment.
Is this a bug we should fix instead of a test change? Should connection401Error be set before the callback is invoked?
There was a problem hiding this comment.
Good call — I've updated the PR to fix this in the production code instead. Moved connection401Error = true and dataSourceUpdateSink.shutDown() before resultCallback.onError() in StreamingDataSource.onError(), so the 401 state is fully established before the callback unblocks any waiting threads.
This also let me revert the polling workarounds from both this test and the one from #329 (startWithHttp401ShutsDownSink), since they're no longer needed.
Move connection401Error and shutDown() before resultCallback.onError() so that the 401 state is fully established before the callback unblocks any waiting threads. This eliminates the race condition at the source rather than working around it in tests. Also reverts the polling workarounds from both startWithHttp401ShutsDownSink (#329) and startWithHttp401PreventsSubsequentStart since they are no longer needed. Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
…ming-test' into devin/1773960729-fix-flaky-streaming-test
|
Check for any other side effects from this change. For example any problems or changes in event ordering from calling shutdown on the event source before emitting the error. |
|
I traced through the production call chain to check for side effects. Here's the analysis: Original order (before this PR):
New order (this PR):
Differences:
Assessment: The behavioral change is the status listener notification ordering. If any consumer relies on seeing the error notification before the SHUTDOWN notification, this could be a breaking change. However, the new order arguably makes more sense — the system transitions to SHUTDOWN state first, then reports why. Let me know if you'd like me to preserve the original listener notification ordering, or if you see any other concerns I missed. |
Requirements
Related issues
Supersedes #329, which worked around the same race condition in the
startWithHttp401ShutsDownSinktest with a polling loop.Describe the solution you've provided
In
StreamingDataSource.onError(),resultCallback.onError()was called beforeconnection401Error = trueanddataSourceUpdateSink.shutDown(). This meant that any thread unblocked by the callback could observe stale state — e.g.start()would not seeconnection401Errorand would proceed with a second connection attempt.The fix reorders the 401 handling so that
connection401ErrorandshutDown()are set beforeresultCallback.onError()is invoked. This eliminates the race at the source and allows the test polling workarounds (from both #329 and this PR's earlier revision) to be removed.Describe alternatives you've considered
The earlier revision of this PR (and #329) added polling loops in the tests to wait for
shutDownCalledto become true before asserting. This worked but masked the underlying ordering issue in production code.Additional context
connection401Erroris not declaredvolatile(unlikerunning). The current fix is safe becauseonErrorruns on a single background thread andstart()is called from the test thread only afterawaitError()returns — which provides a happens-before relationship via theBlockingQueueinsideTrackingCallback. However, a reviewer may want to confirm whethervolatileshould be added for defensive safety.Human review checklist
resultCallback.onError()fired beforeconnection401Errorwas setconnection401Errorshould be declaredvolatilefor consistency withrunningLink to Devin session: https://app.devin.ai/sessions/e45834672c2345108fb56b5fd044a400
Requested by: @kinyoklion