fix: handle LakeFS pagination to return all results beyond default 100-item limit#4349
fix: handle LakeFS pagination to return all results beyond default 100-item limit#4349xuang7 wants to merge 4 commits intoapache:mainfrom
Conversation
| .asScala | ||
| .toList | ||
| fetchAllPages[Diff] { after => | ||
| val request = branchesApi.diffBranch(repoName, branchName).amount(1000) |
There was a problem hiding this comment.
Can we make 1000 configurable? Or why 1000? Thanks
There was a problem hiding this comment.
The 1000 here is the page size for pagination. The fetchallpage will continue requesting pages until all results are retrieved, so I chose a relatively large value to reduce the number of requests. Would you recommend making it configurable, or would using a named constant be sufficient?
There was a problem hiding this comment.
I have changed it to a named constant PageSize.
There was a problem hiding this comment.
Thanks, I still wonder why 1000?
There was a problem hiding this comment.
[Edited] 1000 is the maximum page size allowed by lakers (default is 100). I picked 1000 to reduce the number of requests. It can be adjusted if needed.
aicam
left a comment
There was a problem hiding this comment.
LGTM! left one minor comment?
...flow-core/src/main/scala/org/apache/texera/amber/core/storage/util/LakeFSStorageClient.scala
Outdated
Show resolved
Hide resolved
| .getResults | ||
| .asScala | ||
| .toList | ||
| fetchAllPages[Diff] { after => |
There was a problem hiding this comment.
fetchAllPages itself is supposed to return all pages, why do we still check after here?
There was a problem hiding this comment.
fetchAllPages returns all pages as the final result, and it relies on the provided fetch function to retrieve one page at a time. The after parameter is the cursor passed into each request. I will add a comment to make this clearer. Thanks for the suggestion.
|
LGTM |
What changes were proposed in this PR?
This PR fixes the issue where retrieveUncommittedObjects and retrieveObjectsOfVersion only returned the first page of results (default 100 items). It adds a
fetchAllPagesmethod to retrieve the full result set across all pages. Each request sets.amount(1000)to reduce the number of round trips when fetching paginated results.When uploading 141 files:
Any related issues, documentation, discussions?
Resolves #4343
How was this PR tested?
Added a test that uploads 110 files to a temporary repo and verifies that both retrieveUncommittedObjects (before commit) and retrieveObjectsOfVersion (after commit) return all 110 items.
Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.6