Skip to content

native_datafusion more permissive than Spark 3.x when reading Parquet TimestampNTZ columns #4219

@andygrove

Description

@andygrove

Describe the bug

Spark 3.x does not allow reading Parquet timestamp columns as TimestampNTZ due to the ambiguity. Spark 4.0 relaxed this restriction.

See https://issues.apache.org/jira/browse/SPARK-47447 for details.

The behavior of native_datafusion scan is correct when running with Spark 4.0+ but incorrect when running with Spark 3.x because it returns a value rather than throwing an exception.

The value returned is correct per Spark 4 definition, so this is not a data correctness issue per se, but Comet is more permissive with Spark 3.x

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

area:scanParquet scan / data readingbugSomething isn't workingnative_datafusionSpecific to native_datafusion scan typepriority:mediumFunctional bugs, performance regressions, broken featuresspark 3.x

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions