[CALCITE-7551] Project/Filter/Join transpose and merge rules should not duplicate non-deterministic expressions (e.g. RAND())#4970
Conversation
…ot duplicate non-deterministic expressions (e.g. RAND())
| return null; | ||
| } | ||
| }.visitEach(nodes); | ||
| for (int i = 0; i < refs.length; i++) { |
There was a problem hiding this comment.
As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.
There was a problem hiding this comment.
Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.
There was a problem hiding this comment.
As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().
-
Non-deterministic function: it may return different values for every evaluations.
1.1 Returns false toisDeterministic().
1.2 Returns false toisDynamicFunction(). -
Dynamic Function: It will return same value at every call site within one statement; can differ across executions
2.1 Returns true toisDeterministic().
2.2 Returns true toisDynamicFunction().
And this path only blocks when we get isDeterministic() method response as false. Dynamic functions like CURRENT_TIMESTAMP can be duplicated and actually it is safe to duplicate them.
There was a problem hiding this comment.
The JavaDoc of isDynamicFunction says:
/**
* Returns whether it is unsafe to cache query plans referencing this
* operator; false is assumed by default.
*/
public boolean isDynamicFunction() {
which is not exactly what you seem to imply.
There was a problem hiding this comment.
You're right, that comment mixed up the two flags. The "same value at every call site within one statement" property I attached to isDynamicFunction() is not in its JavaDoc; I generalized from CURRENT_TIMESTAMP's behavior to isDynamicFunction as a whole, which is wrong, isDynamicFunction is overloaded (e.g. SqlRandIntegerFunction also returns true), and its actual contract is just plan-cache invalidation.
Scoping back to this PR: the bug only affects operators that can return different values across call sites within a single statement, i.e. operators with isDeterministic() == false, like RAND and RAND_INTEGER. The other operators that override isDynamicFunction() to true (SqlAbstractTimeFunction, SqlCurrentDateFunction, SqlBaseContextVariable) are per-statement-stable: duplicating them in the plan is observationally a no-op, because every occurrence resolves to the same value within an execution. So they are not affected by this bug and don't need to be blocked.
The guard in the fix uses RexUtil.isDeterministic(...), which is exactly the right discriminator for the affected set. I'll update the PR description to drop the misleading framing of isDynamicFunction().
There was a problem hiding this comment.
That is fine, but I think we have no method to discriminate between RAND and CURRENT_TIMESTAMP. Maybe we need to introduce such a tool?
There was a problem hiding this comment.
That should probably be a separate JIRA issue and PR. Coming up with a good name for this property of CURRENT_TIMESTAMP may be difficult.
| sql(sql).withConformance(SqlConformanceEnum.PRESTO).ok(); | ||
| } | ||
|
|
||
| /** Test case of |
There was a problem hiding this comment.
why is SqlToRel connected to these rules?
There was a problem hiding this comment.
Because SqlToRelConverter builds its RelNode tree through RelBuilder, and RelBuilder eagerly merges adjacent projects at construction time using the same helper that the planner rules use.
Concretely, RelBuilder.project_ method uses the method RelOptUtil.pushPastProjectUnlessBloat(nodeList, project, config.bloat()) which we are fixing as part of this PR, so I thought of putting a test here as well.
| // non-deterministic evaluation (e.g. RAND()) into two: one consumed by | ||
| // the new filter condition, and the original still produced by the | ||
| // project above. Refuse to transpose in that case. | ||
| if (!project.getProjects().stream().allMatch(RexUtil::isDeterministic)) { |
There was a problem hiding this comment.
Blocks transpose (because RAND() exists in project)? eg:
SELECT * FROM (
SELECT RAND() as r, col1 as b FROM emp
)
WHERE col1 > 0
…ferences a non-deterministic projected column
…ondition references a non-deterministic projected column
|



Jira Link
CALCITE-7551
Changes Proposed
Several optimizer and
RelBuilder-level transformations inline a lower-projection expression into multiple positions of an upper expression without checking whether the inlined expression is non-deterministic. When the lower expression is non-deterministic (e.g.RAND()), this silently turns one evaluation into many, changing query semantics.Reproduction
Before the fix,
SqlToRelConverterproduces:The two
RAND()calls are evaluated independently, soB - Ais no longer always1. After the fix the inner projection is preserved andA,Bshare a single evaluation:Scope
The bug only affects operators that can return different values across call sites within a single statement — i.e. operators with
isDeterministic() == false(SqlRandFunction,SqlRandIntegerFunction). Per-statement-stable operators such asCURRENT_TIMESTAMP,CURRENT_DATE,CURRENT_USER, etc. resolve to the same value at every occurrence in one execution, so duplicating them in the plan is observationally a no-op; they don't need to be blocked.The guard uses
RexUtil.isDeterministic(...), which consultsSqlOperator.isDeterministic()and correctly partitions the two sets.Affected code paths and fixes
The guard only kicks in when the transform would actually duplicate a non-deterministic evaluation — not whenever a non-deterministic column merely exists. The precise condition depends on whether the lower project is consumed by the transform or re-emitted above the new node:
RelOptUtil.pushPastProjectUnlessBloatpushShuttle(Project)substitutes everyRexInputRefwith the underlying projection. OnlyRexOverwas guarded; determinism was not. The bottom project is consumed (inlined away).RexInputRefuses per bottom slot; returnnullonly if a non-deterministic bottom expression is referenced more than once (a single reference yields a single inlined copy, which is safe). CoversProjectMergeRuleandFilterProjectTransposeRule's helper call, plus theRelBuilderbloat merge thatSqlToRelConverterrelies on.FilterProjectTransposeRuleRelOptUtil.InputFinder.bits). Filters on deterministic columns are pushed even if the project computes an unrelatedRAND().JoinProjectTransposeRuleRexProgramBuilder.mergePrograms+mergedProgram.expandLocalRef, inlining projected expressions into the new join condition while re-emitting the project above.nLeftFields).SemiJoinProjectTransposeRulemergePrograms+expandLocalRefpattern asJoinProjectTransposeRule.The "consumed vs re-emitted" distinction is why the threshold differs:
ProjectMergeRuleblocks at> 1references (the bottom project disappears, so one reference = one surviving copy), while the transpose rules block at>= 1(the project survives on top, so a single referenced occurrence already yields two evaluations).Tests
Blocked cases (transform must not fire / not duplicate):
RelOptRulesTest.testProjectMergeShouldIgnoreNonDeterministicRelOptRulesTest.testFilterProjectTransposeShouldIgnoreNonDeterministicRelOptRulesTest.testJoinProjectTransposeShouldIgnoreNonDeterministicRelOptRulesTest.testSemiJoinProjectTransposeShouldIgnoreNonDeterministicSqlToRelConverterTest.testRandNotDuplicatedInProjectionMergeAllowed cases (transform still fires when the non-deterministic column is unrelated to the condition):
RelOptRulesTest.testFilterProjectTransposeWithUnrelatedNonDeterministicRelOptRulesTest.testJoinProjectTransposeWithUnrelatedNonDeterministicRelOptRulesTest.testSemiJoinProjectTransposeWithUnrelatedNonDeterministicConfirmed unaffected (verified by additional probing during investigation)
CalcMergeRule(RexProgram preserves CSE via local refs),ProjectReduceExpressionsRule,ProjectValuesMergeRule,ProjectFilterTransposeRule,ProjectJoinTransposeRule,ProjectCorrelateTransposeRule,AggregateProjectMergeRule,FilterSetOpTransposeRule(textual duplication only — each row flows through one branch of aUNION ALL, so per-row evaluation count is unchanged),ProjectToWindowRule.