Validate Iceberg sink key column types by DAlperin · Pull Request #35792 · MaterializeInc/materialize

DAlperin · 2026-03-30T20:30:22Z

Motivation

Iceberg equality deletes require key columns to be primitive, non-floating-point types. The iceberg-rust library cannot use floating-point, array, list, map, or record types as identifier fields. This change adds validation to reject unsupported key column types at sink creation time rather than failing later during data writes.

Description

This PR adds validation to CREATE SINK INTO ICEBERG statements to ensure that key columns are compatible with Iceberg's equality delete requirements:

Validation Logic (src/sql/src/plan/statement/ddl.rs): Added a check in the plan_sink function that validates key column types when creating an Iceberg sink. The validation rejects:
- Float32 and Float64 (floating-point types)
- Array, List, Map, and Record (complex/composite types)
Error Handling (src/sql/src/plan/error.rs): Added a new IcebergSinkUnsupportedKeyType error variant that provides a clear error message indicating which column has an unsupported type and why.
Test Coverage (test/iceberg/key-validation.td): Added comprehensive tests covering:
- Float and double key columns (should fail)
- Map, list key columns (should fail)
- Float/double value columns with valid key columns (should succeed)
- Cleanup of test tables
Test Infrastructure (misc/python/materialize/parallel_workload/action.py, test/iceberg/mzcompose.py): Updated to include the new test file and ignore the expected error message in workload generation.

Verification

The change is covered by the new integration test test/iceberg/key-validation.td which validates that:

Unsupported key column types are rejected with the appropriate error message
Non-key columns with unsupported types do not cause failures
The validation works for all unsupported type categories

https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

… equality deletes The iceberg-rust EqualityDeleteWriterConfig rejects nested types (Map, List, Array/Struct) and floating-point types (Float32, Float64) as equality delete identifiers per the Iceberg spec. The parallel workload was randomly picking key columns from all columns, which could select these unsupported types and cause "Field not found" errors when creating the equality delete writer. Filter the eligible key columns to exclude Float, Double, TextTextMap, IntList, and IntArray before selecting keys for iceberg sinks. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

…g time The Iceberg spec requires equality delete identifier fields to be primitive, non-floating-point types. Previously this was only caught at runtime by the iceberg-rust EqualityDeleteWriterConfig, causing a cryptic "Field not found" error. Now we reject these at CREATE SINK planning time with a clear error message. Rejected types: Float32, Float64, Array, List, Map, Record. Also adds this error to the parallel workload's expected errors list for CreateIcebergSinkAction, since the fallback path in key column selection can still produce this error. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

Undo the client-side key column filtering in IcebergSink.__init__ since the planner now rejects unsupported types with a clear error that the parallel workload ignores via errors_to_ignore. Add an iceberg testdrive test that verifies CREATE SINK rejects float, double, map, and list key columns, and accepts a non-key float column. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

github-actions · 2026-03-30T20:30:32Z

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Switch from a deny-list to an allow-list so that new types are rejected by default rather than silently passing through to cause a runtime error. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

def-

Test changes lgtm, but should they also check if the mz_sink_statuses get any errors?
I triggered a new run with my parallel-workload change rebased on top of this, will check back for the results: https://buildkite.com/materialize/nightly/builds/15919
Edit: Still a bunch of failures:

iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Range { element_type: Int32 } unimplemented"]
iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Range { element_type: Date } unimplemented", "Range { element_type: Numeric { max_scale: None } } unimplemented"]
iceberg: Failed to convert RelationDesc to Iceberg-compatible Arrow schema: Relation contains unimplemented arrow types: ["Interval unimplemented"]

ublubu

Allow-list logic for key columns looks right. Tests seem reasonable. 👍

ublubu · 2026-03-31T19:44:27Z

src/sql/src/plan/statement/ddl.rs

+                for &idx in &indices {
+                    let (col_name, col_type) = cols[idx];
+                    let scalar = &col_type.scalar_type;
+                    let is_valid = matches!(


Someone really likes matches!?

claude added 3 commits March 30, 2026 18:57

sql: use allow-list for iceberg equality delete key types

d32e468

Switch from a deny-list to an allow-list so that new types are rejected by default rather than silently passing through to cause a runtime error. https://claude.ai/code/session_01X3zo5EQokSE1PfGb7vnLNr

DAlperin marked this pull request as ready for review March 31, 2026 03:02

DAlperin requested review from a team as code owners March 31, 2026 03:02

DAlperin requested review from a team, ggevay and ublubu March 31, 2026 03:02

def- reviewed Mar 31, 2026

View reviewed changes

ublubu approved these changes Mar 31, 2026

View reviewed changes

DAlperin merged commit a5806f6 into main Apr 1, 2026
122 checks passed

DAlperin deleted the claude/fix-iceberg-parallel-workload-Gsczq branch April 1, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate Iceberg sink key column types#35792

Validate Iceberg sink key column types#35792
DAlperin merged 4 commits intomainfrom
claude/fix-iceberg-parallel-workload-Gsczq

DAlperin commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

def- left a comment •

edited

Loading

Uh oh!

ublubu left a comment

Uh oh!

ublubu Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

DAlperin commented Mar 30, 2026

Motivation

Description

Verification

Uh oh!

github-actions bot commented Mar 30, 2026

PR title guidelines

Pre-merge checklist

Uh oh!

def- left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ublubu left a comment

Choose a reason for hiding this comment

Uh oh!

ublubu Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

def- left a comment •

edited

Loading