fix: sigv4 auth to use base64-encoded content sha256 and custom canonical request#3120
fix: sigv4 auth to use base64-encoded content sha256 and custom canonical request#3120plusplusjiajia wants to merge 1 commit intoapache:mainfrom
Conversation
|
Hi @kevinjqliu Can you take a look? |
|
thanks for the PR! i understand this is to align with the java SigV4 implementation. Could you help me understand the specific scenario in which this is currently breaking? (i dont know much about sigv4) |
kevinjqliu
left a comment
There was a problem hiding this comment.
reading up on SigV4 java implementation
but some initial comments
| def canonical_request(self, request: Any) -> str: | ||
| cr = super().canonical_request(request) | ||
| # Replace the last line (body_checksum) with hex-encoded payload hash. | ||
| return cr.rsplit("\n", 1)[0] + "\n" + self.payload(request) |
There was a problem hiding this comment.
using rsplit feels fragile, can we reuse the same logic from canonical_request but just replace the last part with self.payload(request)
There was a problem hiding this comment.
im confused why the _IcebergSigV4Auth class is needed
There was a problem hiding this comment.
ideally we should test against an integration test before merging. unit tests can only do so much
|
Ok so from what ive gathered SigV4 is AWS's request signing mechanism. We're addressing the scenario where both SigV4 and OAuth are required for REST catalog authentication. Are you running your own IRC on AWS? i think glue IRC doesnt need the oauth part |
| if request.body: | ||
| body_bytes = request.body.encode("utf-8") if isinstance(request.body, str) else request.body | ||
| content_sha256_header = base64.b64encode(hashlib.sha256(body_bytes).digest()).decode() |
There was a problem hiding this comment.
i dont think we need to base64 encode this. the java implementation doesnt do this. neither does the python reference implementation
|
Adding a few references: |
Rationale for this change
This PR fixes the SigV4 request signing implementation in pyiceberg to align with the behavior of the [Iceberg Java SDK]
The SigV4 canonical request must use the hex-encoded SHA-256 of the payload for signature computation, while the x-amz-content-sha256 header uses base64 encoding. The default botocore.auth.SigV4Auth uses the x-amz-content-sha256 header value directly in the canonical request, which is incorrect when the header is base64-encoded.
Are these changes tested?
Yes
Are there any user-facing changes?
No