Skip to content

remove orphan files#3361

Open
rambleraptor wants to merge 1 commit into
apache:mainfrom
rambleraptor:remove_orphan_snapshots
Open

remove orphan files#3361
rambleraptor wants to merge 1 commit into
apache:mainfrom
rambleraptor:remove_orphan_snapshots

Conversation

@rambleraptor
Copy link
Copy Markdown
Contributor

Rationale for this change

This adds support for the RemoveOrphanFiles metadata maintenance task. The goal is to match the Java implementation.

I had to add a list method to FileIO in order to fully implement this. I can separate that work into a separate PR if that's more useful.

A good follow-up would be to wire this into the CLI. Doing these ad-hoc actions without having to write a script / spin up a Spark cluster is a huge win!

Are these changes tested?

I did some local testing where I took a table with orphaned files and tried both the Java/PyIceberg implementations against it. Results were the same.

There's also plenty of tests.

Are there any user-facing changes?

  • Adds support for the RemoveOrphanFiles maintenance action

@rambleraptor rambleraptor force-pushed the remove_orphan_snapshots branch from c166ff0 to a4a0f6c Compare May 15, 2026 19:08
@qzyu999
Copy link
Copy Markdown

qzyu999 commented May 20, 2026

Hi @rambleraptor, thanks for the PR, quick comment, it may make sense to link that this Closes #1200. I do see a related PR #1958, but it looks potentially abandoned (CC: @jayceslesar).

NOTE: I notice that you call it remove orphan files, while the linked issue is called delete orphan files. The Java code itself calls it DeleteOrphanFiles, however the API for Spark and Trino use remove_orphan_files. Did a bit of research to try and understand why. Seems to potentially avoid ambiguity regarding doing row-level deleting, making it more user-friendly where "remove" is known to be a house-keeping type operation. The Java naming of choosing "delete" actually matches their CRUD interface since they are performing a delete operation on files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants