[SPARK-56842][SQL] Short-circuit AQE when materialized stages are empty by sunchao · Pull Request #55839 · apache/spark

sunchao · 2026-05-13T00:15:55Z

What changes were proposed in this pull request?

This PR extends AQE's empty-relation propagation so it can keep reasoning about an empty materialized stage even when that stage is wrapped by common physical operators:

AQEShuffleReadExec
SortExec
ProjectExec
ColumnarToRowExec

It also keeps aggregate semantics correct while doing that:

grouped aggregates continue to reuse the child stage row-count estimate
global aggregates over an empty child are treated as producing one row, not zero rows

Finally, when AQE adopts a new physical plan and cancels exchange stages that are no longer referenced, cancellation failures from those obsolete stages are ignored instead of being treated as query failures.

Why are the changes needed?

AQE already knows how to short-circuit a plan when a materialized query stage proves that a branch is empty. The problem is that the row-count signal can get lost once the materialized stage is wrapped by operators that preserve emptiness.

For example, consider:

SELECT *
FROM (
  SELECT id AS k
  FROM range(1)
  WHERE id < 0
) l
JOIN (
  SELECT id AS k
  FROM range(200)
) r
ON l.k = r.k

The left side is empty. AQE can materialize that fact early, but the adaptive plan may see it through a shape like:

SortExec
  AQEShuffleReadExec
    ShuffleQueryStageExec(rowCount = 0)

Before this change, the empty-stage information was not propagated through those wrappers, so AQE could miss the opportunity to collapse the join even though the result is known to be empty.

After this change, AQE continues propagating the rowCount = 0 signal through the wrapper nodes and can replace the remaining join work with an empty result plan.

There is also an important aggregate corner case. A global aggregate over empty input still returns one row:

SELECT count(*)
FROM (
  SELECT *
  FROM range(1)
  WHERE id < 0
) t

That result is 0, represented by one output row. This PR preserves that behavior while still allowing grouped aggregates to propagate child row counts.

Finally, once AQE picks a new plan after discovering emptiness, previously launched exchange stages may become obsolete and get cancelled. Those cancellation callbacks should not fail the query, because the stages no longer belong to the chosen adaptive plan.

Does this PR introduce any user-facing change?

Yes.

Queries that become provably empty during adaptive execution can finish with less unnecessary downstream work, and queries should no longer fail because AQE intentionally cancelled exchange stages that became obsolete after re-planning.

How was this patch tested?

Added targeted AQE unit coverage for:

short-circuiting an empty materialized stage through sort wrappers
preserving correctness for an empty filtered global aggregate stage

Validated locally with:

build/sbt sql/compile
build/sbt 'sql/testOnly org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- -z "empty materialized stage short-circuits AQE through sort wrappers" -z "empty filtered global aggregate stage is not treated as non-empty"'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex

dongjoon-hyun

Could you fix a compilation error, @sunchao ?

[error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala:83:13: pattern type is incompatible with expected type;
  [error]  found   : org.apache.spark.sql.catalyst.plans.logical.EmptyRelation
  [error]  required: org.apache.spark.sql.execution.SparkPlan
  [error]     case _: EmptyRelation => Some(0)
  [error]             ^
  [error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala:415:9: not found: value removeStageFromCache
  [error]         removeStageFromCache(stage)
  [error]         ^
  [error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala:417:23: not enough arguments for method cancel: (reason: String): Unit.
  [error] Unspecified value parameter reason.
  [error]           stage.cancel()
  [error]                       ^

sunchao · 2026-05-13T01:01:14Z

Thanks @dongjoon-hyun . Updated.

[SPARK-56842][SQL] Short-circuit AQE when materialized stages are empty

9614eb4

dongjoon-hyun reviewed May 13, 2026

View reviewed changes

[SPARK-56842][SQL] Fix OSS carryover for AQE empty-stage handling

13dfe34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56842][SQL] Short-circuit AQE when materialized stages are empty#55839

[SPARK-56842][SQL] Short-circuit AQE when materialized stages are empty#55839
sunchao wants to merge 2 commits into
apache:masterfrom
sunchao:dev/chao/codex/oss-aqe-empty-stage

sunchao commented May 13, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

sunchao commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sunchao commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

sunchao commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sunchao commented May 13, 2026 •

edited

Loading