Skip to content

[SPARK-54961] [SQL] Introduce GroupingAnalyticsTransformer#53729

Closed
mihailoale-db wants to merge 1 commit intoapache:masterfrom
mihailoale-db:gatransformer
Closed

[SPARK-54961] [SQL] Introduce GroupingAnalyticsTransformer#53729
mihailoale-db wants to merge 1 commit intoapache:masterfrom
mihailoale-db:gatransformer

Conversation

@mihailoale-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

In this PR I propose to introduce GroupingAnalyticsTransformer so we could reuse grouping analytics related code in the single-pass implementation.
Besides the refactor we update nullabilities of Expand operator in place instead waiting on UpdateAttributeNullability to trigger.

Why are the changes needed?

To reuse grouping analytics related code in the single-pass implementation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 8, 2026

JIRA Issue Information

=== Improvement SPARK-54961 ===
Summary: Introduce GroupingAnalyticsTransformer
Assignee: None
Status: Open
Affected: ["4.1.0"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions Bot added the SQL label Jan 8, 2026
@mihailoale-db
Copy link
Copy Markdown
Contributor Author

@vladimirg-db @mihailotim-db Please review. Besides what's mentioned in the PR desc, we have few expression tree traversals e.g. in replaceGroupingFunction which are way easier to reuse instead of rewriting them in single-pass. Those traversals are going to be performed only in case there is a grouping analytics (not always) so I say that we are fine here.

@mihailoale-db mihailoale-db force-pushed the gatransformer branch 2 times, most recently from 84cd9b1 to 362d563 Compare January 9, 2026 14:43
}

val groupingSetsAttributes = selectedGroupByExpressions.map { groupingSetExprs =>
groupingSetExprs.map { expression =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation here is very slow. We iterate over groupingSetExprs, inside we iterate over groupByAliases, and every iteration does a semanticEquals tree traversal with a heavy comparison operator. Let's merge this as-is for the purposes of the refactor, but we should followup with a transition to SemanticComparator maybe? wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll followup.

Copy link
Copy Markdown
Contributor

@vladimirg-db vladimirg-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the refactor, but let's consider optimization followups

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 81dea41 Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants