Skip to content

[SPARK-56840][SQL] Avoid unresolved NullIf type lookup#55838

Open
sunchao wants to merge 3 commits into
apache:masterfrom
sunchao:dev/chao/codex/oss-nullif-unresolved
Open

[SPARK-56840][SQL] Avoid unresolved NullIf type lookup#55838
sunchao wants to merge 3 commits into
apache:masterfrom
sunchao:dev/chao/codex/oss-nullif-unresolved

Conversation

@sunchao
Copy link
Copy Markdown
Member

@sunchao sunchao commented May 12, 2026

Why are the changes needed?

NULLIF builds its replacement expression before analysis has resolved all child expressions.
For nested field references, the existing implementation can read the left operand's data type
too early while constructing the null branch, which can fail analysis even though the SQL shape
is valid.

SPARK-56840 tracks this analyzer failure.

What changes were proposed in this PR?

  • Build the NULLIF null branch with a lazy typed-null placeholder so construction does not eagerly
    read the unresolved left operand type, while NullIf.replacement.dataType remains valid once the
    operand type is available.
  • Make that placeholder RuntimeReplaceable, so ReplaceExpressions restores an ordinary typed
    Literal(null, ...) before later optimizer rules run and existing null-literal simplifications
    continue to apply.
  • Add focused regressions for:
    • nested struct-field nullif(c.provider, lower(...)) analysis in both
      ALWAYS_INLINE_COMMON_EXPR modes;
    • NullIf replacement type reporting before type coercion;
    • optimizer replacement back to a normal null literal;
    • explain output avoiding exposure of the internal helper name.

Does this PR introduce any user-facing change?

Yes. Valid NULLIF expressions over unresolved nested field references that could fail during
analysis now resolve and execute successfully.

How was this patch tested?

  • build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'
  • build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'
  • build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'
  • build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5.5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant