[CALCITE-6451] Improve Nullability Derivation for Intersect and Minus by xiedeyantu · Pull Request #4897 · apache/calcite

xiedeyantu · 2026-04-20T14:35:30Z

Jira Link

Changes Proposed

SetOp overrides deriveRowType() and computes the output row type to be the least restrictive across all inputs here.

So for example given

Input 1: (I64, I64, I64?, I64?)
Input 2: (I64, I64?, I64, I64?)
where ? denotes nullable, the least restrictive output computes:

Output: (I64, I64?, I64?, I64?)
For UNION operations, these nullabilities are accurate.

However for MINUS and INTERSECT there is room for improvement.

MINUS only returns rows from the first input, as such its output nullability should always match that of its first input:

Output: (I64, I64, I64?, I64?)
INTERSECT only returns rows that match across all inputs. If a column is not nullable in any of the inputs, then it is not nullable in the output because no rows can be emitted in which that column is null:

Output: (I64, I64, I64, I64?)

Co-authored-by: Victor Barua <victor.barua@datadoghq.com>

xiedeyantu · 2026-04-20T14:36:39Z

Related PR #3845.

mihaibudiu · 2026-04-20T20:19:03Z

you have some checker failures

mihaibudiu · 2026-04-20T20:19:16Z

Does this work around the problems in the other PR?

xiedeyantu · 2026-04-20T23:06:43Z

Does this work around the problems in the other PR?

Are you referring to #3845? I noticed that you had approved this PR before, but there were some conflicts. Since it's been a long time, the CI status is no longer visible, and it's unclear if there were other issues back then. I think it's a good PR, so I’m trying to finish it up.

mihaibudiu · 2026-04-20T23:07:50Z

Yes, the discussion in JIRA was about causing problems with other rules.

xiedeyantu · 2026-04-21T00:06:19Z

Yes, the discussion in JIRA was about causing problems with other rules.

I didn’t see any discussion in the Jira. Are you referring to the discussion in the original PR? I have resolved the rule conflicts.

mihaibudiu · 2026-04-21T00:08:05Z

yes, the original PR

xiedeyantu · 2026-04-21T01:05:53Z

According to this disscusion #3845 (comment) .
I think we don't need to dwell on this issue too much. If we transform INTERSECT into UNION, then we can simply use UNION's type inference. It's like LEFT JOIN (though this example might not be entirely appropriate), which can also change the nullability of columns from the right table. I'm not sure if my understanding is correct.

Co-authored-by: Victor Barua <victor.barua@datadoghq.com>

xiedeyantu · 2026-04-22T00:47:49Z

@mihaibudiu I'm not sure if you agree with the current simplified processing logic. If you have time, please review this PR to see if there are any other concerns.

mihaibudiu · 2026-04-22T00:53:15Z

@vbarua what do you think of this approach?

silundong · 2026-04-22T03:47:13Z

 !ok

-EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2])
+EnumerableAggregate(group=[{0}])


Do you know why this plan changed?

I haven't been able to pinpoint how the old plan included AGG in the join, but the new plan seems fine. I might need to investigate this difference in more detail later when I have more time.

silundong · 2026-04-22T05:48:57Z

+      }
+
+      if (!nullFilters.isEmpty()) {
+        relBuilder.filter(nullFilters);


There is a relBuilder.convert(intersect.getRowType(), false) call at the end, is this logic necessary?
Calcite doesn't have the ability to refine type inference from predicates, this rewrite is semantically equivalent, but it is unrelated to the issue-6451.

silundong · 2026-04-22T05:49:58Z

    relBuilder.project(Util.skipLast(relBuilder.fields()));

+    // ensure the nullabilities of columns in the new relation match those of the input relation
+    relBuilder.convert(intersect.getRowType(), false);


I'm not sure that whether onMatchAggregateOnUnion need this.

I agree, it would be safer that way.

sonarqubecloud · 2026-05-01T05:21:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
93.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mihaibudiu · 2026-06-01T18:25:59Z

What is the status of this PR?

xiedeyantu · 2026-06-02T00:59:26Z

What is the status of this PR?

There may only be one problem left now, which is that I haven't found the root cause of the changes in the plan mentioned by silun above.

xiedeyantu · 2026-06-03T15:13:42Z

Current PR Change Management Process:

MinusMergeRule
Merges the nested binary Minus operators into a single multi-input LogicalMinus:

LogicalMinus(all=[false])
  t1
  t2
  t3

MinusToAntiJoinRule
Rewrites the multi-input Minus into a chain of anti joins with a top-level distinct aggregate:

LogicalAggregate(group=[{0}])
  LogicalJoin(joinType=[anti])
    LogicalJoin(joinType=[anti])
      t1
      t2
    t3

JoinPushExpressionsRule
Extracts the CAST expressions from the join condition into projection columns (A0), allowing the join to use direct column references:

LogicalProject(A=[$0])
  LogicalJoin(condition=[=($1, $3)], joinType=[anti])
    LogicalProject(A=[$0], A0=[CAST($0)])
    LogicalProject(A=[$0], A0=[CAST($0)])

EnumerableJoinRule
Converts the logical anti joins into physical join implementations. The inner anti join becomes a nested-loop join, while the outer anti join is implemented as a hash join:

EnumerableHashJoin(joinType=[anti])
  EnumerableProject(A=[$0], A0=[CAST($0)])
    EnumerableNestedLoopJoin(joinType=[anti])
      ...

EnumerableAggregateRule
Converts the top-level logical aggregate into a physical enumerable aggregate:

EnumerableAggregate(group=[{0}])
  EnumerableHashJoin(joinType=[anti])
    ...

The final selected plan is:

EnumerableAggregate
  EnumerableHashJoin (anti)
    EnumerableProject(A0=CAST(...))
      EnumerableNestedLoopJoin (anti)
        ...

The original plan took this specific form because, within Volcano, every rule generates a corresponding candidate RelNode. A particular scenario arises with nested Minus operations: for each level of nesting, an equivalent Aggregation-plus-Join structure is generated. Subsequently, during the final cost-based selection phase, this specific structure was deemed to have the lowest cost and was therefore selected. Please refer to the plan diagram below.

This is my current understanding; please take another look to see if it is clearly explained. @silundong @mihaibudiu

mihaibudiu · 2026-06-03T17:46:37Z

If the plan is semantically equivalent, it's not a problem from my pov.
If @silundong cannot complete the review I can try to take a look.

[CALCITE-6451] Improve Nullability Derivation for Intersect and Minus

8803f1a

Co-authored-by: Victor Barua <victor.barua@datadoghq.com>

xiedeyantu and others added 2 commits April 21, 2026 09:22

Fix CI

3aa4602

Co-authored-by: Victor Barua <victor.barua@datadoghq.com>

Fix CI

623720a

Co-authored-by: Victor Barua <victor.barua@datadoghq.com>

silundong reviewed Apr 22, 2026

View reviewed changes

xiedeyantu added 8 commits April 30, 2026 23:39

Addressed

0566daf

Addressed

f55065b

Addressed

051caa2

Addressed

46141d1

Addressed

611c7b5

Addressed

4f69274

Addressed

66b93f2

Addressed

e096f40

Conversation

xiedeyantu commented Apr 20, 2026

Jira Link

Changes Proposed

Uh oh!

xiedeyantu commented Apr 20, 2026

Uh oh!

mihaibudiu commented Apr 20, 2026

Uh oh!

mihaibudiu commented Apr 20, 2026

Uh oh!

xiedeyantu commented Apr 20, 2026

Uh oh!

mihaibudiu commented Apr 20, 2026

Uh oh!

xiedeyantu commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihaibudiu commented Apr 21, 2026

Uh oh!

xiedeyantu commented Apr 21, 2026

Uh oh!

xiedeyantu commented Apr 22, 2026

Uh oh!

mihaibudiu commented Apr 22, 2026

Uh oh!

silundong Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

silundong Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

silundong Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 1, 2026

Quality Gate passed

Uh oh!

mihaibudiu commented Jun 1, 2026

Uh oh!

xiedeyantu commented Jun 2, 2026

Uh oh!

xiedeyantu commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihaibudiu commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xiedeyantu commented Apr 21, 2026 •

edited

Loading

xiedeyantu commented Jun 3, 2026 •

edited

Loading