Validate Iceberg sink key column types#35792
Merged
Conversation
… equality deletes The iceberg-rust EqualityDeleteWriterConfig rejects nested types (Map, List, Array/Struct) and floating-point types (Float32, Float64) as equality delete identifiers per the Iceberg spec. The parallel workload was randomly picking key columns from all columns, which could select these unsupported types and cause "Field not found" errors when creating the equality delete writer. Filter the eligible key columns to exclude Float, Double, TextTextMap, IntList, and IntArray before selecting keys for iceberg sinks. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr
…g time The Iceberg spec requires equality delete identifier fields to be primitive, non-floating-point types. Previously this was only caught at runtime by the iceberg-rust EqualityDeleteWriterConfig, causing a cryptic "Field not found" error. Now we reject these at CREATE SINK planning time with a clear error message. Rejected types: Float32, Float64, Array, List, Map, Record. Also adds this error to the parallel workload's expected errors list for CreateIcebergSinkAction, since the fallback path in key column selection can still produce this error. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr
Undo the client-side key column filtering in IcebergSink.__init__ since the planner now rejects unsupported types with a clear error that the parallel workload ignores via errors_to_ignore. Add an iceberg testdrive test that verifies CREATE SINK rejects float, double, map, and list key columns, and accepts a non-key float column. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr
Contributor
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
Switch from a deny-list to an allow-list so that new types are rejected by default rather than silently passing through to cause a runtime error. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr
def-
reviewed
Mar 31, 2026
Contributor
There was a problem hiding this comment.
Test changes lgtm, but should they also check if the mz_sink_statuses get any errors?
I triggered a new run with my parallel-workload change rebased on top of this, will check back for the results: https://buildkite.com/materialize/nightly/builds/15919
Edit: Still a bunch of failures:
- iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Range { element_type: Int32 } unimplemented"]
- iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Range { element_type: Date } unimplemented", "Range { element_type: Numeric { max_scale: None } } unimplemented"]
- iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Interval unimplemented"]
ublubu
approved these changes
Mar 31, 2026
Contributor
ublubu
left a comment
There was a problem hiding this comment.
Allow-list logic for key columns looks right. Tests seem reasonable. 👍
| for &idx in &indices { | ||
| let (col_name, col_type) = cols[idx]; | ||
| let scalar = &col_type.scalar_type; | ||
| let is_valid = matches!( |
Contributor
There was a problem hiding this comment.
Someone really likes matches!?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Iceberg equality deletes require key columns to be primitive, non-floating-point types. The iceberg-rust library cannot use floating-point, array, list, map, or record types as identifier fields. This change adds validation to reject unsupported key column types at sink creation time rather than failing later during data writes.
Description
This PR adds validation to
CREATE SINK INTO ICEBERGstatements to ensure that key columns are compatible with Iceberg's equality delete requirements:Validation Logic (
src/sql/src/plan/statement/ddl.rs): Added a check in theplan_sinkfunction that validates key column types when creating an Iceberg sink. The validation rejects:Float32andFloat64(floating-point types)Array,List,Map, andRecord(complex/composite types)Error Handling (
src/sql/src/plan/error.rs): Added a newIcebergSinkUnsupportedKeyTypeerror variant that provides a clear error message indicating which column has an unsupported type and why.Test Coverage (
test/iceberg/key-validation.td): Added comprehensive tests covering:Test Infrastructure (
misc/python/materialize/parallel_workload/action.py,test/iceberg/mzcompose.py): Updated to include the new test file and ignore the expected error message in workload generation.Verification
The change is covered by the new integration test
test/iceberg/key-validation.tdwhich validates that:https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr