[CALCITE-5101] LISTAGG function with DISTINCT and ORDER BY fails by xuzifu666 · Pull Request #4983 · apache/calcite

xuzifu666 · 2026-06-01T16:27:33Z

jira: https://issues.apache.org/jira/browse/CALCITE-5101

mihaibudiu · 2026-06-01T17:39:18Z

    if (collations.size() == 1) {
      RelFieldCollation collation = collations.get(0);
+      final int index = collation.getFieldIndex();
+      if (index < 0 || index >= rowType.getFieldList().size()) {


I guess these should never be reachable if the code generator is correct

Yes, this is just a defensive check, but it seems unnecessary now, so I deleted it.

mihaibudiu · 2026-06-01T17:41:33Z

+          }
+        }
+
+        if (!validCollation) {


I am a bit confused: there is a sort order specified, yet it is ignored? That cannot be right.

There are two options:

Complete Fix: add ORDER BY columns to the re-grouped input, requires architectural changes to aggregate planning,changes query semantics, potential impact on other parts.This is a large-scale refactoring;

Graceful Degradation(current way): returns correct results without crashes, loses sorting information, but doesn't crash.This is a rare scenario:only occurs with LISTAGG(DISTINCT ...) WITHIN GROUP (ORDER BY non-group-column).

According the conditions I choose the second way. So do you think my current plan is reasonable?

Why is the result correct if you ignore the order?

This is a compromise solution to avoid crashes by downgrading the handling of such special statements (removing the order by clause). The main issue is that a thorough fix involves many aspects, with less consideration for ROI. If it becomes clear that a completely accurate order by result is required, I will attempt a fix with the lowest possible cost in the future.

I don't understand the difference between "less acurate" and "wrong"
Either the result is correct, or it's not. I don't think we should take a fix which avoids crashes and produces wrong results.

Okay, I will try to fix this issue completely later.

@mihaibudiu I had fixed the issue, PTAL.

FYI, if you're going to change AggregateExpandDistinctAggregatesRule to fix this bug, you'll also should to update rewriteUsingGroupingSets, since the same issue can be triggered by a query like:

SELECT deptno, LISTAGG(DISTINCT ename) WITHIN GROUP (ORDER BY sal), LISTAGG(DISTINCT deptno) WITHIN GROUP (ORDER BY ename) FROM emp GROUP BY deptno;```

At the very least, a test checking this should be added

sonarqubecloud · 2026-06-02T02:44:11Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mihaibudiu

Please reply to the other comment you have received as well.

mihaibudiu · 2026-06-03T17:48:59Z

      if (aggCall.isDistinct()) {
        bottomGroups.addAll(aggCall.getArgList());
+        // Also include ORDER BY columns from WITHIN GROUP
+        for (RelFieldCollation fc : aggCall.collation.getFieldCollations()) {


I think this is a better approach.
Can this add the same column twice? Is that a problem?

Wouldn't this cause DISTINCT to be ignored in certain scenarios?

Consider the following example:

SELECT deptno, SUM(DISTINCT sal) WITHIN GROUP (ORDER BY bonus) FROM EMP GROUP BY deptno

With this modification, the rule would rewrite the query as:

SELECT deptno, SUM(sal) WITHIN GROUP (ORDER BY bonus) FROM ( SELECT deptno, sal, bonus FROM EMP GROUP BY deptno, sal, bonus ) GROUP BY deptno

This does not correctly enforce DISTINCT on sal: if two rows share the same sal value but have different bonus values, both survive the inner GROUP BY and sal ends up counted twice in the outer SUM, which violates the DISTINCT semantics.

PS: Please feel free to correct me or let me know if I am intervening inappropriately.

[CALCITE-5101] LISTAGG function with DISTINCT and ORDER BY fails

2bd42fd

xuzifu666 mentioned this pull request Jun 1, 2026

[CALCITE-5101] AggregateExpandDistinctAggregatesRule does not remap WITHIN GROUP collation indices, causing ArrayIndexOutOfBoundsException #4982

Closed

mihaibudiu reviewed Jun 1, 2026

View reviewed changes

Addressed

32de196

xuzifu666 requested a review from mihaibudiu June 2, 2026 15:27

xuzifu666 added 2 commits June 3, 2026 12:03

Addressed

2886c8e

Addressed

c94e4c3

mihaibudiu reviewed Jun 3, 2026

View reviewed changes

Conversation

xuzifu666 commented Jun 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GoncaloCoutoDosSantos Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Quality Gate passed

Uh oh!

mihaibudiu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GoncaloCoutoDosSantos Jun 3, 2026 •

edited

Loading