[SPARK-56839][SS] Remove dev loop config for new state format of stream-stream join#55837
Open
HeartSaVioR wants to merge 2 commits into
Conversation
…t version 4 ### What changes were proposed in this pull request? This PR removes the internal gating config `spark.sql.streaming.join.stateFormatV4.enabled` (`STREAMING_JOIN_STATE_FORMAT_V4_ENABLED`) and the corresponding `require(...)` guard in `SymmetricHashJoinStateManager` that prevented state format version 4 from being instantiated when the flag was off. The doc on `STREAMING_JOIN_STATE_FORMAT_VERSION` is also updated to remove the "under development and only available for testing" wording for V4, and to briefly describe what V4 does. ### Why are the changes needed? The config was added so that V4 would not be exposed while it was partially implemented. V4 is now complete, so the gating flag is no longer needed and only adds friction. ### Does this PR introduce _any_ user-facing change? No. The removed config was internal and (by default) only enabled in test runs. The default state format version remains 2, so existing queries are unaffected. Users who had explicitly set `spark.sql.streaming.join.stateFormatV4.enabled` will need to drop that setting; setting `spark.sql.streaming.join.stateFormatVersion=4` is now sufficient to opt in. ### How was this patch tested? Ran the four V4 join test suites in `StreamingJoinV4Suite.scala` (`StreamingInnerJoinV4Suite`, `StreamingOuterJoinV4Suite`, `StreamingFullOuterJoinV4Suite`, `StreamingLeftSemiJoinV4Suite`). 131 tests, all passed. NOTE: A JIRA ticket will be filed before opening the PR against apache/spark; this commit is for internal review on the personal fork.
eason-yuchen-liu
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to remove dev-loop config for new state format of stream-stream join.
Why are the changes needed?
We made a code complete about new state format in stream-stream join in Apache Spark 4.2.0, but forgot about removing the config which was a gate to prevent the feature to be released "in the middle". Since all code changes have been made into 4.2.0, the flag is unnecessary.
They still need to opt-in the state format version, so that's effectively playing a gate even though we remove the gate config. Keeping the gate config is just more verbose.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UTs. (Actually N/A since the config has been
truefor UTs)Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7