feat: add complex type support to native Parquet writer #3214

andygrove · 2026-01-18T19:24:33Z

Summary

Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer
Removes the blocking check that previously prevented complex types
Adds comprehensive test coverage for complex types

Changes

Remove complex type blocking check in CometDataWritingCommand.scala
Add 12 new tests for complex types in CometParquetWriterSuite.scala:
- Basic complex types (array, struct, map)
- Nested complex types (array of structs, struct containing array, map with struct values, deeply nested)
- Nullable complex types with nulls at various nesting levels
- Complex types containing decimal and temporal types
- Empty arrays and maps
- Fuzz testing with randomly generated complex type schemas
Update documentation to reflect complex type support

Test plan

Tests verify round-trip compatibility (write with Comet, read with Spark/Comet)
Fuzz testing with randomly generated schemas

🤖 Generated with Claude Code

Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer by removing the blocking check that previously prevented them. Changes: - Remove complex type blocking check in CometDataWritingCommand.scala - Add comprehensive test coverage for complex types including: - Basic complex types (array, struct, map) - Nested complex types (array of structs, struct containing array, etc.) - Nullable complex types with nulls at various nesting levels - Complex types containing decimal and temporal types - Empty arrays and maps - Fuzz testing with randomly generated complex type schemas - Update documentation to reflect complex type support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

codecov-commenter · 2026-01-18T19:47:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.05%. Comparing base (f09f8af) to head (369ddab).
⚠️ Report is 856 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3214      +/-   ##
============================================
+ Coverage     56.12%   60.05%   +3.92%     
- Complexity      976     1429     +453     
============================================
  Files           119      170      +51     
  Lines         11743    15758    +4015     
  Branches       2251     2602     +351     
============================================
+ Hits           6591     9463    +2872     
- Misses         4012     4976     +964     
- Partials       1140     1319     +179

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Enable spark.comet.scan.allowIncompatible in complex type tests so that native_iceberg_compat scan is used (which supports complex types) instead of falling back to native_comet (which doesn't support complex types). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove · 2026-01-18T21:16:12Z

With these changes, we can run the PySpark repartition benchmark fully native, and it shows an almost 2x speedup compared to Spark (and also ~2x compared to Comet when writes are disabled and Comet does the columnar-to-row transition).

The CI sets COMET_PARQUET_SCAN_IMPL=native_comet for some test profiles, which overrides the default auto mode. Since native_comet doesn't support complex types, the scan falls back to Spark's reader which produces OnHeapColumnVector instead of CometVector, causing the native writer to fail. This fix explicitly sets COMET_NATIVE_SCAN_IMPL to "auto" in the test configuration, allowing native_iceberg_compat to be used for complex types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove · 2026-01-18T22:47:53Z

With these changes, we can run the PySpark repartition benchmark fully native, and it shows an almost 2x speedup compared to Spark (and also ~2x compared to Comet when writes are disabled and Comet does the columnar-to-row transition).

@comphead ☝️

spark/src/test/scala/org/apache/comet/parquet/CometParquetWriterSuite.scala

andygrove · 2026-01-19T18:18:08Z

Thanks for the thorough review @wForget. I refactored the test framework and added assertions that the plans are running as intended (Spark vs Comet).

wForget · 2026-01-20T01:22:14Z

native/core/src/execution/operators/parquet_writer.rs

+                .strip_prefix("file://")
+                .or_else(|| part_file.strip_prefix("file:"))
+                .unwrap_or(&part_file);
+            let file_size = std::fs::metadata(local_path)


A question unrelated to this PR: #2929 supports write to hdfs. So, is the logic here correct when local_path is an hdfs path?

wForget

Thanks @andygrove , LGTM

andygrove marked this pull request as draft January 18, 2026 20:17

andygrove force-pushed the feat/complex-type-parquet-write branch from c6bc58a to 93d1f82 Compare January 18, 2026 20:37

andygrove mentioned this pull request Jan 18, 2026

Comet should gracefully handle OnHeapColumnVector instead of failing #3215

Open

andygrove and others added 3 commits January 18, 2026 14:18

save

bc6c799

format

ce7b6d4

andygrove marked this pull request as ready for review January 18, 2026 22:42

andygrove requested a review from comphead January 18, 2026 22:47

andygrove requested a review from wForget January 19, 2026 01:55

wForget reviewed Jan 19, 2026

View reviewed changes

spark/src/test/scala/org/apache/comet/parquet/CometParquetWriterSuite.scala Outdated Show resolved Hide resolved

wForget reviewed Jan 19, 2026

View reviewed changes

spark/src/test/scala/org/apache/comet/parquet/CometParquetWriterSuite.scala Outdated Show resolved Hide resolved

andygrove added 3 commits January 19, 2026 10:47

improve test framework

3c60957

improve test framework

9f0fc6d

improve test framework

43ce736

andygrove added 2 commits January 19, 2026 12:48

fix

38c3557

fix

369ddab

wForget reviewed Jan 20, 2026

View reviewed changes

wForget approved these changes Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add complex type support to native Parquet writer #3214

feat: add complex type support to native Parquet writer #3214

andygrove commented Jan 18, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 18, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 18, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 18, 2026

Uh oh!

Uh oh!

Uh oh!

andygrove commented Jan 19, 2026

Uh oh!

wForget Jan 20, 2026

Uh oh!

wForget left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add complex type support to native Parquet writer #3214

Are you sure you want to change the base?

feat: add complex type support to native Parquet writer #3214

Conversation

andygrove commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

codecov-commenter commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Jan 18, 2026

Uh oh!

Uh oh!

Uh oh!

andygrove commented Jan 19, 2026

Uh oh!

wForget Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wForget left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Jan 18, 2026 •

edited

Loading

codecov-commenter commented Jan 18, 2026 •

edited

Loading

andygrove commented Jan 18, 2026 •

edited

Loading