fix: add space in decimal type serialization for cross-engine compat#2538
fix: add space in decimal type serialization for cross-engine compat#2538SreeramGarlapati wants to merge 1 commit into
Conversation
…pliance The Iceberg spec, Java, and Python all serialize the decimal type as `decimal(P, S)` (with a space after the comma). iceberg-rust was writing `decimal(P,S)` which could cause metadata written by Rust to be rejected or misinterpreted by strict parsers in other implementations. The deserializer already handles both formats via trim(), so this change is backwards-compatible for reading. Closes apache#2534 Co-authored-by: rawataaryan9 <rawataaryan9@users.noreply.github.com>
945b288 to
3499e4b
Compare
mbutrovich
left a comment
There was a problem hiding this comment.
LGTM, thanks @SreeramGarlapati!
xanderbailey
left a comment
There was a problem hiding this comment.
Interestingly the spec makes it seem like this is legitimate both with and without the space? It would appear that if other consumers are failing here, they're not compliant?
Interested to know what other people in the community think the right approach here is? It's true that this fix will align with other clients but I also feel like perhaps we either need to clarify the spec or improve the other clients to be compliant?
mbutrovich
left a comment
There was a problem hiding this comment.
Changing my approval after the better investigation by @xanderbailey, thanks for double-checking!
i feel like thats a formatting issue on the spec and not intention to have both the form with and without the space. That said, the general principle with ser/de for iceberg project is Postel’s Law
This is a great place to clarify the spec 😄 thanks for catching it! |
Summary
iceberg-rust serializes the decimal type as
decimal(P,S)(no space), while the Iceberg spec, Java, and Python all usedecimal(P, S)(space after comma). This mismatch can cause issues when metadata written by iceberg-rust is read by strict parsers in other implementations.What changed
serialize_decimal:decimal({precision},{scale})→decimal({precision}, {scale})Display for PrimitiveType: same fix for the decimal armWhy this is safe
The deserializer already uses
.trim()on both precision and scale components, so it accepts bothdecimal(9,2)anddecimal(9, 2). Existing metadata files with the old format continue to be readable.Closes #2534
Test plan