Skip to content

[SPARK-56839][SS] Remove dev loop config for new state format of stream-stream join#55837

Open
HeartSaVioR wants to merge 2 commits into
apache:masterfrom
HeartSaVioR:remove-dev-loop-config-for-stream-stream-join-new-state-format
Open

[SPARK-56839][SS] Remove dev loop config for new state format of stream-stream join#55837
HeartSaVioR wants to merge 2 commits into
apache:masterfrom
HeartSaVioR:remove-dev-loop-config-for-stream-stream-join-new-state-format

Conversation

@HeartSaVioR
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR proposes to remove dev-loop config for new state format of stream-stream join.

Why are the changes needed?

We made a code complete about new state format in stream-stream join in Apache Spark 4.2.0, but forgot about removing the config which was a gate to prevent the feature to be released "in the middle". Since all code changes have been made into 4.2.0, the flag is unnecessary.

They still need to opt-in the state format version, so that's effectively playing a gate even though we remove the gate config. Keeping the gate config is just more verbose.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs. (Actually N/A since the config has been true for UTs)

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…t version 4

### What changes were proposed in this pull request?

This PR removes the internal gating config
`spark.sql.streaming.join.stateFormatV4.enabled`
(`STREAMING_JOIN_STATE_FORMAT_V4_ENABLED`) and the corresponding
`require(...)` guard in `SymmetricHashJoinStateManager` that prevented
state format version 4 from being instantiated when the flag was off.

The doc on `STREAMING_JOIN_STATE_FORMAT_VERSION` is also updated to
remove the "under development and only available for testing" wording
for V4, and to briefly describe what V4 does.

### Why are the changes needed?

The config was added so that V4 would not be exposed while it was
partially implemented. V4 is now complete, so the gating flag is no
longer needed and only adds friction.

### Does this PR introduce _any_ user-facing change?

No. The removed config was internal and (by default) only enabled in
test runs. The default state format version remains 2, so existing
queries are unaffected. Users who had explicitly set
`spark.sql.streaming.join.stateFormatV4.enabled` will need to drop that
setting; setting `spark.sql.streaming.join.stateFormatVersion=4` is now
sufficient to opt in.

### How was this patch tested?

Ran the four V4 join test suites in `StreamingJoinV4Suite.scala`
(`StreamingInnerJoinV4Suite`, `StreamingOuterJoinV4Suite`,
`StreamingFullOuterJoinV4Suite`, `StreamingLeftSemiJoinV4Suite`).
131 tests, all passed.

NOTE: A JIRA ticket will be filed before opening the PR against
apache/spark; this commit is for internal review on the personal fork.
@HeartSaVioR HeartSaVioR changed the title [SPARK-56839] Remove dev loop config for new state format of stream-stream join [SPARK-56839][SS] Remove dev loop config for new state format of stream-stream join May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants