[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException by GoncaloCoutoDosSantos · Pull Request #4982 · apache/calcite

GoncaloCoutoDosSantos · 2026-06-01T16:11:55Z

[CALCITE-5101] LISTAGG(DISTINCT x) WITHIN GROUP (ORDER BY y) throws ArrayIndexOutOfBoundsException when the ORDER BY column differs from the GROUP BY and DISTINCT columns

Problem

A query such as

SELECT deptno, LISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY sal)
FROM emp
GROUP BY deptno

fails with an ArrayIndexOutOfBoundsException (surfaced as IllegalStateException: Unable to implement EnumerableAggregate(...), thrown from PhysTypeImpl.generateCollationKey). The failure occurs whenever the WITHIN GROUP (ORDER BY ...) column is neither the GROUP BY column nor the DISTINCT argument.

These variants already worked:

LISTAGG(DISTINCT ename) — no ORDER BY
LISTAGG(ename) WITHIN GROUP (ORDER BY sal) — no DISTINCT
LISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY ename) — ordering matches the aggregated column

Root cause

AggregateExpandDistinctAggregatesRule rewrites a distinct aggregate into an inner "distinct" aggregate feeding an outer aggregate. When building the outer AggregateCall it correctly remapped the argument indices to the post-expansion input shape, but left the WITHIN GROUP collation indices untouched. The collation then referenced a pre-expansion column that the inner aggregate had dropped, yielding an out-of-bounds (or simply wrong) field index.

Fix

The rule now carries every ORDER BY column through the inner aggregate and remaps the outer collation accordingly:

Added a remapCollation helper that rewrites a RelCollation's field indices through an index map, preserving direction and null-direction.
Grouping-sets path (rewriteUsingGroupingSets): for each distinct aggregate's collation column, if it is already a group key it is mapped to its position in the bottom group set; otherwise it is carried through via a MIN(column) marker ($mc_*) added to the bottom aggregate. The upper aggregate's collation is remapped through this map.
Monopole / join path (createSelectDistinct, doRewrite, rewriteAggCalls): collation columns that are not already projected are projected, genuine ORDER BY columns (neither a group key nor a distinct argument) are carried through via MIN(column) markers, and the outer collation is remapped via the sourceOf map (now recording each marker's output ordinal).

The result is that the ORDER BY value survives the distinct aggregate and the outer LISTAGG orders by the correct column.

Tests

RelOptRulesTest plan tests covering both expansion paths:
- testDistinctListAggWithinGroupOrderByOtherColumn / ...OrderByOtherColumn1 — ORDER BY a non-group, non-distinct column (single and multiple distinct aggregates).
- testDistinctListAggWithinGroupOrderByDistinctColumn — ORDER BY the distinct argument (no marker needed).
- testDistinctListAggWithinGroupMultipleOrderBy — multiple ORDER BY columns over the same argument (one marker each).
agg.iq execution tests that assert the actual LISTAGG output ordering (not just that the query completes), covering both the monopole and grouping-sets paths.

…ITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException

mihaibudiu · 2026-06-01T16:27:08Z

You have to use the same commit message you used in the JIRA issue, to help us match the two.
You can edit either one.
PR title should ideally be the same as well.

xuzifu666 · 2026-06-01T16:33:15Z

I actually take this task in Jira a few days ago and implemented the fixes; I’ve also already submitted the corresponding PR(#4983) . Your current approach to the fix is incorrect; the correction should not be implemented within the AggregateExpandDistinctAggregatesRule itself. Additionally, your commit message does not align with the details in Jira.

GoncaloCoutoDosSantos · 2026-06-01T16:35:53Z

I actually take this task in Jira a few days ago and implemented the fixes; I’ve also already submitted the corresponding PR(#4983) . Your current approach to the fix is incorrect; the correction should not be implemented within the AggregateExpandDistinctAggregatesRule itself. Additionally, your commit message does not align with the details in Jira.

OK, sorry, i did not seen it, i will close the PR

sonarqubecloud · 2026-06-01T16:49:34Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
96.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap W…

597fbd5

…ITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException

GoncaloCoutoDosSantos closed this Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException#4982

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException#4982
GoncaloCoutoDosSantos wants to merge 1 commit into
apache:mainfrom
GoncaloCoutoDosSantos:CAL-5101

GoncaloCoutoDosSantos commented Jun 1, 2026 •

edited

Loading

Uh oh!

mihaibudiu commented Jun 1, 2026

Uh oh!

xuzifu666 commented Jun 1, 2026

Uh oh!

GoncaloCoutoDosSantos commented Jun 1, 2026

Uh oh!

sonarqubecloud Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GoncaloCoutoDosSantos commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[CALCITE-5101] LISTAGG(DISTINCT x) WITHIN GROUP (ORDER BY y) throws ArrayIndexOutOfBoundsException when the ORDER BY column differs from the GROUP BY and DISTINCT columns

Problem

Root cause

Fix

Tests

Uh oh!

mihaibudiu commented Jun 1, 2026

Uh oh!

xuzifu666 commented Jun 1, 2026

Uh oh!

GoncaloCoutoDosSantos commented Jun 1, 2026

Uh oh!

sonarqubecloud Bot commented Jun 1, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GoncaloCoutoDosSantos commented Jun 1, 2026 •

edited

Loading