feat(datafusion): support PARTITIONED BY for identity-partitioned external tables by huan233usc · Pull Request #2575 · apache/iceberg-rust

huan233usc · 2026-06-03T06:19:49Z

Which issue does this PR close?

Partially solve Support CREATE EXTERNAL TABLE PARTITIONED BY syntax with DataFusion #2050

What changes are included in this PR?

CREATE EXTERNAL TABLE ... STORED AS ICEBERG (via IcebergTableProviderFactory) previously rejected any PARTITIONED BY clause outright.

DataFusion's PARTITIONED BY grammar only accepts plain column names — it cannot express Iceberg transforms such as bucket(16, id) or days(ts) (unlike Spark's native DSv2 grammar). Given that constraint, this PR:

Stops rejecting table_partition_cols in check_cmd.
Adds validate_partition_columns, run after the table is loaded:
- If the table's default partition spec uses any non-identity transform, returns a clear FeatureUnsupported error naming the offending field/transform.
- Otherwise validates that the declared columns exactly match the identity partition columns in order (consistent with PartitionSpec::is_compatible_with and Java's PartitionSpec.compatibleWith, where field order is significant).
Omitting PARTITIONED BY keeps the previous behavior: any table — including non-identity partitioned ones — can still be registered for read-only access.
A TODO is left to support non-identity transforms once DataFusion's grammar can express them.

Example

CREATE EXTERNAL TABLE my_iceberg_table
STORED AS ICEBERG LOCATION '/path/to/metadata.json'
PARTITIONED BY (event_date);

Are these changes tested?

Yes. Added unit tests in table_provider_factory.rs plus two metadata fixtures (bucket-partitioned and multi-identity-partitioned):

single identity column match / mismatch
multiple identity columns match / wrong order / subset (count mismatch)
non-identity (bucket[4]) transform rejected with a clear error
non-identity partitioned table still registers when PARTITIONED BY is omitted

cargo test -p iceberg-datafusion and cargo clippy -p iceberg-datafusion --all-targets pass.

…ernal tables `CREATE EXTERNAL TABLE ... STORED AS ICEBERG` previously rejected any `PARTITIONED BY` clause. Since DataFusion's grammar only accepts plain column names (it cannot express transforms such as `bucket[N]` or `day`), allow the clause for identity-partitioned tables and validate that the declared columns match the table's default partition spec, in order. Tables partitioned with non-identity transforms can still be registered by omitting the clause; specifying it returns a clear error pointing at the offending transform. Closes apache#2050

huan233usc · 2026-06-03T06:23:04Z

+/// non-identity transforms, can still be registered for read-only access without declaring
+/// its partitioning.
+fn validate_partition_columns(table: &Table, declared_partition_cols: &[String]) -> Result<()> {
+    if declared_partition_cols.is_empty() {


The behavior here is open for discussion.

We could choose ignore validation partition spec, pros is it will unblock user creating an external table that is partitioned(potentially with the case data fusion not supported), cons is the sql is not strictly accurate.

…rtition mismatch cases

…l tables

huan233usc commented Jun 3, 2026

View reviewed changes

huan233usc mentioned this pull request Jun 3, 2026

Support CREATE EXTERNAL TABLE PARTITIONED BY syntax with DataFusion #2050

Open

huan233usc added 3 commits June 2, 2026 23:32

test(datafusion): dedupe metadata-location helpers and consolidate pa…

5cc0751

…rtition mismatch cases

chore(datafusion): drop self-referential issue link from TODO comment

729c86d

test(datafusion): add end-to-end SQL tests for PARTITIONED BY externa…

3a7fc98

…l tables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datafusion): support PARTITIONED BY for identity-partitioned external tables#2575

feat(datafusion): support PARTITIONED BY for identity-partitioned external tables#2575
huan233usc wants to merge 4 commits into
apache:mainfrom
huan233usc:feat/datafusion-external-table-partitioned-by

huan233usc commented Jun 3, 2026 •

edited

Loading

Uh oh!

huan233usc Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

huan233usc commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Example

Are these changes tested?

Uh oh!

huan233usc Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

huan233usc commented Jun 3, 2026 •

edited

Loading