[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException#4982
Conversation
…ITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException
|
You have to use the same commit message you used in the JIRA issue, to help us match the two. |
|
I actually take this task in Jira a few days ago and implemented the fixes; I’ve also already submitted the corresponding PR(#4983) . Your current approach to the fix is incorrect; the correction should not be implemented within the |
OK, sorry, i did not seen it, i will close the PR |
|



[CALCITE-5101] LISTAGG(DISTINCT x) WITHIN GROUP (ORDER BY y) throws ArrayIndexOutOfBoundsException when the ORDER BY column differs from the GROUP BY and DISTINCT columns
Problem
A query such as
fails with an
ArrayIndexOutOfBoundsException(surfaced asIllegalStateException: Unable to implement EnumerableAggregate(...), thrown fromPhysTypeImpl.generateCollationKey). The failure occurs whenever theWITHIN GROUP (ORDER BY ...)column is neither theGROUP BYcolumn nor theDISTINCTargument.These variants already worked:
LISTAGG(DISTINCT ename)— noORDER BYLISTAGG(ename) WITHIN GROUP (ORDER BY sal)— noDISTINCTLISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY ename)— ordering matches the aggregated columnRoot cause
AggregateExpandDistinctAggregatesRulerewrites a distinct aggregate into an inner "distinct" aggregate feeding an outer aggregate. When building the outerAggregateCallit correctly remapped the argument indices to the post-expansion input shape, but left theWITHIN GROUPcollation indices untouched. The collation then referenced a pre-expansion column that the inner aggregate had dropped, yielding an out-of-bounds (or simply wrong) field index.Fix
The rule now carries every ORDER BY column through the inner aggregate and remaps the outer collation accordingly:
remapCollationhelper that rewrites aRelCollation's field indices through an index map, preserving direction and null-direction.rewriteUsingGroupingSets): for each distinct aggregate's collation column, if it is already a group key it is mapped to its position in the bottom group set; otherwise it is carried through via aMIN(column)marker ($mc_*) added to the bottom aggregate. The upper aggregate's collation is remapped through this map.createSelectDistinct,doRewrite,rewriteAggCalls): collation columns that are not already projected are projected, genuine ORDER BY columns (neither a group key nor a distinct argument) are carried through viaMIN(column)markers, and the outer collation is remapped via thesourceOfmap (now recording each marker's output ordinal).The result is that the ORDER BY value survives the distinct aggregate and the outer
LISTAGGorders by the correct column.Tests
RelOptRulesTestplan tests covering both expansion paths:testDistinctListAggWithinGroupOrderByOtherColumn/...OrderByOtherColumn1— ORDER BY a non-group, non-distinct column (single and multiple distinct aggregates).testDistinctListAggWithinGroupOrderByDistinctColumn— ORDER BY the distinct argument (no marker needed).testDistinctListAggWithinGroupMultipleOrderBy— multiple ORDER BY columns over the same argument (one marker each).agg.iqexecution tests that assert the actualLISTAGGoutput ordering (not just that the query completes), covering both the monopole and grouping-sets paths.