Skip to content

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException#4982

Closed
GoncaloCoutoDosSantos wants to merge 1 commit into
apache:mainfrom
GoncaloCoutoDosSantos:CAL-5101
Closed

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException#4982
GoncaloCoutoDosSantos wants to merge 1 commit into
apache:mainfrom
GoncaloCoutoDosSantos:CAL-5101

Conversation

@GoncaloCoutoDosSantos
Copy link
Copy Markdown

@GoncaloCoutoDosSantos GoncaloCoutoDosSantos commented Jun 1, 2026

[CALCITE-5101] LISTAGG(DISTINCT x) WITHIN GROUP (ORDER BY y) throws ArrayIndexOutOfBoundsException when the ORDER BY column differs from the GROUP BY and DISTINCT columns

Problem

A query such as

SELECT deptno, LISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY sal)
FROM emp
GROUP BY deptno

fails with an ArrayIndexOutOfBoundsException (surfaced as IllegalStateException: Unable to implement EnumerableAggregate(...), thrown from PhysTypeImpl.generateCollationKey). The failure occurs whenever the WITHIN GROUP (ORDER BY ...) column is neither the GROUP BY column nor the DISTINCT argument.

These variants already worked:

  • LISTAGG(DISTINCT ename) — no ORDER BY
  • LISTAGG(ename) WITHIN GROUP (ORDER BY sal) — no DISTINCT
  • LISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY ename) — ordering matches the aggregated column

Root cause

AggregateExpandDistinctAggregatesRule rewrites a distinct aggregate into an inner "distinct" aggregate feeding an outer aggregate. When building the outer AggregateCall it correctly remapped the argument indices to the post-expansion input shape, but left the WITHIN GROUP collation indices untouched. The collation then referenced a pre-expansion column that the inner aggregate had dropped, yielding an out-of-bounds (or simply wrong) field index.

Fix

The rule now carries every ORDER BY column through the inner aggregate and remaps the outer collation accordingly:

  • Added a remapCollation helper that rewrites a RelCollation's field indices through an index map, preserving direction and null-direction.
  • Grouping-sets path (rewriteUsingGroupingSets): for each distinct aggregate's collation column, if it is already a group key it is mapped to its position in the bottom group set; otherwise it is carried through via a MIN(column) marker ($mc_*) added to the bottom aggregate. The upper aggregate's collation is remapped through this map.
  • Monopole / join path (createSelectDistinct, doRewrite, rewriteAggCalls): collation columns that are not already projected are projected, genuine ORDER BY columns (neither a group key nor a distinct argument) are carried through via MIN(column) markers, and the outer collation is remapped via the sourceOf map (now recording each marker's output ordinal).

The result is that the ORDER BY value survives the distinct aggregate and the outer LISTAGG orders by the correct column.

Tests

  • RelOptRulesTest plan tests covering both expansion paths:
    • testDistinctListAggWithinGroupOrderByOtherColumn / ...OrderByOtherColumn1 — ORDER BY a non-group, non-distinct column (single and multiple distinct aggregates).
    • testDistinctListAggWithinGroupOrderByDistinctColumn — ORDER BY the distinct argument (no marker needed).
    • testDistinctListAggWithinGroupMultipleOrderBy — multiple ORDER BY columns over the same argument (one marker each).
  • agg.iq execution tests that assert the actual LISTAGG output ordering (not just that the query completes), covering both the monopole and grouping-sets paths.

…ITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException
@mihaibudiu
Copy link
Copy Markdown
Contributor

You have to use the same commit message you used in the JIRA issue, to help us match the two.
You can edit either one.
PR title should ideally be the same as well.

@xuzifu666
Copy link
Copy Markdown
Member

I actually take this task in Jira a few days ago and implemented the fixes; I’ve also already submitted the corresponding PR(#4983) . Your current approach to the fix is ​​incorrect; the correction should not be implemented within the AggregateExpandDistinctAggregatesRule itself. Additionally, your commit message does not align with the details in Jira.

@GoncaloCoutoDosSantos
Copy link
Copy Markdown
Author

I actually take this task in Jira a few days ago and implemented the fixes; I’ve also already submitted the corresponding PR(#4983) . Your current approach to the fix is ​​incorrect; the correction should not be implemented within the AggregateExpandDistinctAggregatesRule itself. Additionally, your commit message does not align with the details in Jira.

OK, sorry, i did not seen it, i will close the PR

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants