API, Spark: Add branch support to RemoveDanglingDeleteFiles#15957
Open
kiyeonjeon21 wants to merge 1 commit intoapache:mainfrom
Open
API, Spark: Add branch support to RemoveDanglingDeleteFiles#15957kiyeonjeon21 wants to merge 1 commit intoapache:mainfrom
kiyeonjeon21 wants to merge 1 commit intoapache:mainfrom
Conversation
RemoveDanglingDeleteFiles always operated on the main branch and did not accept a branch parameter. This meant that when RewriteDataFilesSparkAction invoked it with the remove-dangling-deletes option, the branch context was lost. This change adds a toBranch(String) method to the RemoveDanglingDeleteFiles API and implements it in RemoveDanglingDeletesSparkAction. Metadata table reads are now scoped to the target branch's snapshot, and the resulting RewriteFiles commit is directed to that branch. RewriteDataFilesSparkAction now forwards its branch to the dangling delete removal step. Closes apache#15369
f05dc2c to
fda4872
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RemoveDanglingDeleteFilesalways operated on the main branch. There was no way to target a specific branch, andRewriteDataFilesSparkActiondid not forward its branch when invoking the action internally.This PR:
toBranch(String)default method to theRemoveDanglingDeleteFilesAPIRemoveDanglingDeletesSparkActionRewriteDataFilesSparkActionto the dangling delete removal stepCloses #15369
Changes
toBranch(String)with a defaultUnsupportedOperationExceptionto avoid breaking changes (revapi passes)snapshot-idoption. Commits are directed to the branch viaRewriteFiles.toBranch(branch)SparkTable.create(metadataTable, TimeTravel)instead ofsnapshot-idoption, since time travel options were reworked in Spark 4.1branchfield toRemoveDanglingDeletesSparkActiontestBranchSupportandtestBranchWithDanglingDeletesfor v3.5, v4.0, v4.1Notes
findDanglingDeletesSQL relies ondata_file.partitionwhich does not exist for unpartitioned tables. Addressing unpartitioned tables would require a different query strategy and is better handled separately.Test plan
./gradlew :iceberg-api:revapipassesTestRemoveDanglingDeleteActionpasses (18 tests, 0 failures on Spark 4.1)./gradlew spotlessCheckpasses