chore: Mark expressions with known correctness issues as incompatible#3675
Merged
andygrove merged 8 commits intoapache:mainfrom Mar 12, 2026
Merged
chore: Mark expressions with known correctness issues as incompatible#3675andygrove merged 8 commits intoapache:mainfrom
andygrove merged 8 commits intoapache:mainfrom
Conversation
Review all open correctness issues and mark affected expressions as Incompatible so they fall back to Spark by default. Update the compatibility guide with detailed documentation of each incompatibility and links to tracking issues. Expressions marked Incompatible: - ArrayContains (apache#3346), GetArrayItem (apache#3330, apache#3332), ArrayRemove (apache#3173) - Hour, Minute, Second for TimestampNTZ inputs (apache#3180) - TruncTimestamp for non-UTC timezones (apache#2649) - Ceil, Floor for Decimal inputs (apache#1729) - Tan (apache#1897), Corr (apache#2646), StructsToJson (apache#3016)
Member
Author
|
I am working on updating some tests to enable allowIncomp for these expressions |
map_contains_key is internally rewritten by Spark to use ArrayContains, so it needs the allowIncompatible config to run natively.
Contributor
|
Thanks @andygrove maybe we can add some |
Member
Author
Personally, I am very much in favor of adding skills to help with code reviews, documentation audit, release process, etc. I am not sure how others feel about it though. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to: #3645, #3644, #3346, #3332, #3330, #3180, #3173, #3016, #2649, #2646, #1897, #1729, #1630
Rationale for this change
Several expressions are currently marked as Compatible (Spark-compatible) but have open correctness issues that can produce incorrect results. These expressions should fall back to Spark by default to prevent silent data corruption, and can be explicitly enabled by users who understand the trade-offs via
allowIncompatible=true.What changes are included in this PR?
Expressions marked as Incompatible (9 expressions across 6 serde files):
ArrayContainsGetArrayItemArrayRemoveHour,Minute,SecondTruncTimestampCeil,FloorTanCorrStructsToJsonWhere possible, the incompatibility is conditional on the specific input type that triggers the bug (e.g., Hour/Minute/Second are only incompatible for TimestampNTZ, Ceil/Floor only for Decimal, TruncTimestamp only for non-UTC timezones).
Documentation updates:
expressions.md: Updated Spark-Compatible status from "Yes" to "No" for all affected expressions, with compatibility notes linking to tracking issuescompatibility.md: Added detailed "Incompatible Expressions" subsections organized by category (Array, Date/Time, Math, Aggregate, Struct) with descriptions and issue linksHow are these changes tested?
These changes only add
getSupportLeveloverrides (which cause expressions to fall back to Spark by default) and update documentation. The existing test suite covers the fallback mechanism. No new behavioral logic is introduced.