Skip to content

fix(workflow-core): paginate S3 deleteDirectory deletions#5569

Open
kunwp1 wants to merge 3 commits into
apache:mainfrom
kunwp1:fix/s3-delete-directory-pagination
Open

fix(workflow-core): paginate S3 deleteDirectory deletions#5569
kunwp1 wants to merge 3 commits into
apache:mainfrom
kunwp1:fix/s3-delete-directory-pagination

Conversation

@kunwp1

@kunwp1 kunwp1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

S3StorageClient.deleteDirectory listed objects with a single listObjectsV2 call and issued one deleteObjects batch. Both S3 APIs cap at 1000 keys per call, so for any prefix holding more than 1000 objects only the first 1000 were deleted and the rest causes a storage leak. This affects dataset deletion (DatasetResource) and per-execution cleanup (LargeBinaryManager), either of which can exceed 1000 objects under one prefix.

This PR:

  • Lists via listObjectsV2Paginator, which follows the continuation token across all pages, and deletes in batches of at most 1000 keys. Keys are streamed so memory stays bounded to a single batch.
  • Inspects each DeleteObjects response and throws if any key failed.

Any related issues, documentation, discussions?

Closes #5281

How was this PR tested?

  1. Create more than 1000 files for i in {1..1100}; do printf 'x' > "file_$i.txt"; done
  2. Upload them in a dataset. (There is a frontend memory issue when you upload all 1100 files at the same time. Try to upload batch-by-batch)
  3. Delete the dataset.
  4. Check if all the files are removed in the minio console. (Before this fix, some files remain)

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

deleteDirectory issued a single listObjectsV2 call (capped at 1000 keys
per page) and one deleteObjects batch, so any objects beyond the first 1000
under a prefix were silently orphaned. AWS DeleteObjects also accepts at
most 1000 keys per request, and reports per-key failures in its response
rather than throwing.

List via listObjectsV2Paginator, which transparently follows the
continuation token across all pages, deleting keys in batches of at most
1000. Inspect each DeleteObjects response and raise if any key failed, so
partial deletions are no longer swallowed. Also remove the unused MD5
computation (digested over the request's toString and never used).

Closes apache#5281
@kunwp1 kunwp1 force-pushed the fix/s3-delete-directory-pagination branch from 1b4b0c3 to e3deaba Compare June 8, 2026 22:07
@codecov-commenter

codecov-commenter commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.18%. Comparing base (564ccdb) to head (5c644e1).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #5569   +/-   ##
=========================================
  Coverage     52.17%   52.18%           
+ Complexity     2482     2481    -1     
=========================================
  Files          1068     1068           
  Lines         41311    41314    +3     
  Branches       4439     4440    +1     
=========================================
+ Hits          21556    21560    +4     
  Misses        18490    18490           
+ Partials       1265     1264    -1     
Flag Coverage Δ *Carryforward flag
access-control-service 64.61% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from 7044ff6
amber 53.30% <100.00%> (+0.01%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.06% <ø> (ø)
file-service 38.32% <ø> (ø)
frontend 46.42% <ø> (ø) Carriedforward from 7044ff6
pyamber 90.72% <ø> (ø) Carriedforward from 7044ff6
python 90.75% <ø> (ø) Carriedforward from 7044ff6
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kunwp1 added 2 commits June 8, 2026 15:22
…ry docs

- Cap throwOnDeleteErrors enumeration at MAX_LISTED_DELETE_ERRORS (10) and summarize the remainder, so a fully-failed batch can't produce an unbounded message.
- Update the LargeBinaryManager doc that still described the pre-pagination single-page (<=1000) behavior of deleteDirectory.
@kunwp1 kunwp1 requested a review from Xiao-zhen-Liu June 9, 2026 07:59
@kunwp1

kunwp1 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/request-review @Xiao-zhen-Liu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3StorageClient.deleteDirectory only deletes the first ≤1000 objects per prefix

2 participants