HDDS-13532. Refactor BucketEndpoint.get()#10045
HDDS-13532. Refactor BucketEndpoint.get()#10045rich7420 wants to merge 9 commits intoapache:masterfrom
Conversation
| // If you specify the encoding-type request parameter, should return encoded key name values | ||
| // in the following response elements: Delimiter, Prefix, Key, and StartAfter. | ||
| // For detail refer: | ||
| // https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#AmazonS3-ListObjectsV2-response-EncodingType |
There was a problem hiding this comment.
nit : this comment is repeated.
| // Actual bucket processing starts here | ||
| // Validate and prepare parameters | ||
| BucketListingContext listingContext = validateAndPrepareParameters( | ||
| bucketName, delimiter, encodingType, marker, maxKeys, prefix, | ||
| continueToken, startAfter); | ||
|
|
||
| // If continuation token and start after both are provided, then we | ||
| // ignore start After | ||
| String prevKey = continueToken != null ? decodedToken.getLastKey() | ||
| : startAfter; | ||
| // Initialize response object | ||
| ListObjectResponse response = initializeListObjectResponse( | ||
| bucketName, delimiter, encodingType, marker, maxKeys, prefix, | ||
| continueToken, startAfter); |
There was a problem hiding this comment.
Here maxKeys is initialized from the query parameters (e.g., a user requests 5000), validateAndPrepareParameters is called. Inside this method, maxKeys is capped via validateMaxKeys(maxKeys) (e.g., capped to 1000). This capped value is stored inside the BucketListingContext, the maxKeys variable back in the get() method is not updated. It remains 5000. initializeListObjectResponse is called using the original maxKeys (5000).The response object now explicitly states 5000.
I think if we pass listingContext.getMaxKeys() not the local maxKeys to initializeListObjectResponse this issue would not be seen.
Could you confirm whether that’s intentional or something we should align?
|
@sreejasahithi Thanks for catching these! |
What changes were proposed in this pull request?
Reopen: #9110
This pull request refactors the massively monolithic
BucketEndpoint.get()method (ListObjects API implementation) to vastly improve codebase maintainability, readability, and unit-testability.The original
get()method was nearly 200 lines long. It tightly coupled logic across multiple domains—from parameter string deserialization, context variable instantiation, iterative key-prefix filtering logic, to S3 metrics/audit log emissions. This structure made it inherently risky to extend and nearly impossible to write isolated unit tests for individual components of the logic.Key Refactoring Improvements in this PR:
BucketListingContext): Introduced a dedicated package-private class to securely encapsulate listing state (such asbucketName,prefix,encodingType,maxKeys,decodedToken,ozoneKeyIterator, etc.). This successfully prevented "parameter-creep" and minimized variable clutter across the execution flow.validateAndPrepareParameters(): Sanitizes configuration limits, handles token decodings, verifies encoding types, and enforces S3 Bucket owner conditions.initializeListObjectResponse(): Generates the bare-bones boilerplate response structures.processKeyListing(): Owns the complex computational burden (thewhile (ozoneKeyIterator.hasNext())mechanism), resolving token depth matches, delimited deep structures, andCommonPrefixesallocations safely.buildFinalResponse(): Isolates the responsibility of executingPerformanceStringBuilderincrements and consistentauditReadSuccesspublishing.master's recent additions (S3RequestContext,PerformanceStringBuilder, and theBucketOperationHandlerChain-of-Responsibility pattern). The refactor cleanly embraces these integrations while achieving optimal flow footprint.What is the link to the Apache JIRA?
https://issues.apache.org/jira/browse/HDDS-13532
How was this patch tested?
https://github.com/rich7420/ozone/actions/runs/23993539910