Skip to content

HDDS-13532. Refactor BucketEndpoint.get()#10045

Open
rich7420 wants to merge 9 commits intoapache:masterfrom
rich7420:HDDS-13532
Open

HDDS-13532. Refactor BucketEndpoint.get()#10045
rich7420 wants to merge 9 commits intoapache:masterfrom
rich7420:HDDS-13532

Conversation

@rich7420
Copy link
Copy Markdown
Contributor

@rich7420 rich7420 commented Apr 5, 2026

What changes were proposed in this pull request?

Reopen: #9110
This pull request refactors the massively monolithic BucketEndpoint.get() method (ListObjects API implementation) to vastly improve codebase maintainability, readability, and unit-testability.

The original get() method was nearly 200 lines long. It tightly coupled logic across multiple domains—from parameter string deserialization, context variable instantiation, iterative key-prefix filtering logic, to S3 metrics/audit log emissions. This structure made it inherently risky to extend and nearly impossible to write isolated unit tests for individual components of the logic.

Key Refactoring Improvements in this PR:

  • Context Encapsulation (BucketListingContext): Introduced a dedicated package-private class to securely encapsulate listing state (such as bucketName, prefix, encodingType, maxKeys, decodedToken, ozoneKeyIterator, etc.). This successfully prevented "parameter-creep" and minimized variable clutter across the execution flow.
  • Method Extraction & Separation of Concerns: We systematically decomposed the unmaintainable block into multiple single-responsibility helper methods:
    • validateAndPrepareParameters(): Sanitizes configuration limits, handles token decodings, verifies encoding types, and enforces S3 Bucket owner conditions.
    • initializeListObjectResponse(): Generates the bare-bones boilerplate response structures.
    • processKeyListing(): Owns the complex computational burden (the while (ozoneKeyIterator.hasNext()) mechanism), resolving token depth matches, delimited deep structures, and CommonPrefixes allocations safely.
    • buildFinalResponse(): Isolates the responsibility of executing PerformanceStringBuilder increments and consistent auditReadSuccess publishing.
  • Seamless Upstream Synchronization: During refactoring, this branch was comprehensively synced and merged with master's recent additions (S3RequestContext, PerformanceStringBuilder, and the BucketOperationHandler Chain-of-Responsibility pattern). The refactor cleanly embraces these integrations while achieving optimal flow footprint.

What is the link to the Apache JIRA?

https://issues.apache.org/jira/browse/HDDS-13532

How was this patch tested?

https://github.com/rich7420/ozone/actions/runs/23993539910

Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rich7420

Comment on lines +292 to +295
// If you specify the encoding-type request parameter, should return encoded key name values
// in the following response elements: Delimiter, Prefix, Key, and StartAfter.
// For detail refer:
// https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#AmazonS3-ListObjectsV2-response-EncodingType
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : this comment is repeated.

Comment on lines +133 to +142
// Actual bucket processing starts here
// Validate and prepare parameters
BucketListingContext listingContext = validateAndPrepareParameters(
bucketName, delimiter, encodingType, marker, maxKeys, prefix,
continueToken, startAfter);

// If continuation token and start after both are provided, then we
// ignore start After
String prevKey = continueToken != null ? decodedToken.getLastKey()
: startAfter;
// Initialize response object
ListObjectResponse response = initializeListObjectResponse(
bucketName, delimiter, encodingType, marker, maxKeys, prefix,
continueToken, startAfter);
Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here maxKeys is initialized from the query parameters (e.g., a user requests 5000), validateAndPrepareParameters is called. Inside this method, maxKeys is capped via validateMaxKeys(maxKeys) (e.g., capped to 1000). This capped value is stored inside the BucketListingContext, the maxKeys variable back in the get() method is not updated. It remains 5000. initializeListObjectResponse is called using the original maxKeys (5000).The response object now explicitly states 5000.

I think if we pass listingContext.getMaxKeys() not the local maxKeys to initializeListObjectResponse this issue would not be seen.

Could you confirm whether that’s intentional or something we should align?

@rich7420
Copy link
Copy Markdown
Contributor Author

rich7420 commented Apr 7, 2026

@sreejasahithi Thanks for catching these!
I've removed the redundant comment and updated initializeListObjectResponse to properly use the capped listingContext.getMaxKeys(). The PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants