Skip to content

[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803

Open
zhuwenzhuang wants to merge 2 commits intoapache:mainfrom
zhuwenzhuang:large_plan_mode
Open

[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803
zhuwenzhuang wants to merge 2 commits intoapache:mainfrom
zhuwenzhuang:large_plan_mode

Conversation

@zhuwenzhuang
Copy link
Contributor

@zhuwenzhuang zhuwenzhuang commented Feb 24, 2026

Motivation and details: https://issues.apache.org/jira/browse/CALCITE-7422
Before:
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 9s
LargePlanBenchmark:10000 : pretty slow, cannot measure.

CPU Profiler Result(enable fired rules cache and disable large plan mode):
fireRule(RelOptRuleCall ruleCall) takes 2% cpu time.
image
Mem Profiler Result(enable fired rules cache and disable large plan mode):
image

After (enable fired rules cache and large plan mode):
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 2s
LargePlanBenchmark:10000 : about 60s

Benchmark (unionNum) Mode Cnt Score Error Units LargePlanBenchmark.testLargeUnionPlan 100 avgt 256.561 ms/op LargePlanBenchmark.testLargeUnionPlan 1000 avgt 1616.421 ms/op LargePlanBenchmark.testLargeUnionPlan 10000 avgt 53393.727 ms/op
CPU Profiler Result:
fireRule(RelOptRuleCall ruleCall) takes 11% cpu time. There is still lots of room for CPU optimization.
image

Mem Profiler Result:
(avoid buildListRecurse/collectGarbage, smaller memory peak size)
image

@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 6 times, most recently from d87d57a to d4ae34e Compare February 24, 2026 15:35
@zhuwenzhuang zhuwenzhuang marked this pull request as draft February 24, 2026 15:38
@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 2 times, most recently from 6369ba1 to 6d9ab05 Compare February 25, 2026 06:12
@zhuwenzhuang
Copy link
Contributor Author

A better iterater implementation of DFS/BFS is needed. I will optimize this later (unavailable for the next two weeks).

@zhuwenzhuang zhuwenzhuang changed the title [CALCITE-7422] Support large plan optimizaion mode for HepPlanner [CALCITE-7422] Support large plan optimization mode for HepPlanner Mar 1, 2026
@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 7 times, most recently from 4e77147 to 3870056 Compare March 12, 2026 09:46
@zhuwenzhuang
Copy link
Contributor Author

zhuwenzhuang commented Mar 12, 2026

After default depth first iterator replaced by HepVertexIterator.

LargePlanBenchmark:10000 takes 3.6s [10000 union (40000 rel nodes), large plan mode + rule cache + ARBITRARY match order]:

  1. CPU Profiler Result
    1.1.fireRule(RelOptRuleCall ruleCall) takes 70 % CPU.
image

1.2. garbageCollection's removeEdge(V source, V target) takes 23 % CPU.
image

Memory Profiler Result:
Primarily used by rules, with rare use by the planner/iterator itself.
image

LargePlanBenchmark:100000 takes 68.741 s [400k rel nodes, same configuration]

All perf result of LargePlanBenchmark:
NOTE:NodeCount/RuleTransforms are estimation values from the test' scale.

MatchOrder UnionNum NodeCount RuleTransforms Time (ms)
ARBITRARY 1,000 4,000 6,006 1,043
ARBITRARY 3,000 12,000 18,006 1,306
ARBITRARY 10,000 40,000 60,006 3,655
ARBITRARY 30,000 120,000 180,006 13,040
DEPTH_FIRST 1,000 4,000 6,006 347
DEPTH_FIRST 3,000 12,000 18,006 1,068
DEPTH_FIRST 10,000 40,000 60,006 4,165
DEPTH_FIRST 30,000 120,000 180,006 12,898
BOTTOM_UP 1,000 4,000 6,006 1,145
BOTTOM_UP 3,000 12,000 18,006 10,152
TOP_DOWN 1,000 4,000 6,006 1,193
TOP_DOWN 3,000 12,000 18,006 8,074

@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 2 times, most recently from 90c7cd5 to 9ed2932 Compare March 12, 2026 10:31
@zhuwenzhuang zhuwenzhuang marked this pull request as ready for review March 12, 2026 11:02
Key optimizations of large plan mode:

1. Reusable graph, avoid reinit.
2. Efficient traversal, skip stable subtree.
3. Fine-grained GC.

Usage: see comments of HepPlanner()

Perf result of LargePlanBenchmark:
Match Order     Union Num  Node Count      Rule Transforms Time (ms)
--------------------------------------------------------------------
ARBITRARY       1000       4000            6006            1043
ARBITRARY       3000       12000           18006           1306
ARBITRARY       10000      40000           60006           3655
ARBITRARY       30000      120000          180006          13040
DEPTH_FIRST     1000       4000            6006            347
DEPTH_FIRST     3000       12000           18006           1068
DEPTH_FIRST     10000      40000           60006           4165
DEPTH_FIRST     30000      120000          180006          12898
BOTTOM_UP       1000       4000            6006            1145
BOTTOM_UP       3000       12000           18006           10152
TOP_DOWN        1000       4000            6006            1193
TOP_DOWN        3000       12000           18006           8074

private boolean enableFiredRulesCache = false;

private boolean largePlanMode = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the optimization somehow worse for small plans?
If it's always faster, there should not be such a flag, so you should document it as "To be removed in the future".
If small plans are worse in this mode, the flag should remain, but it still should be documented somehow (e.g., how to choose whether to set it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization is always better for plans in any scale.
I will add the comment "To be removed in the future".

* planner.setRoot(initPlanRoot);
* planner.executeProgram(phase1Program);
* planner.dumpRuleAttemptsInfo(); // optional
* planner.clear(); // clear the rules and rule match caches, the graph is reused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment about "the graph is reused".
Maybe the graph is deallocated?

Copy link
Contributor Author

@zhuwenzhuang zhuwenzhuang Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graph is not deallocated.
The clear() api isn't good for the purpose that clean the rule caches and keep the graph in the HepPlanner (for other HepPrograms).
Maybe clearRules is better?

HepPlanner planner = new HepPlanner(); planner.setRoot(initPlanRoot); planner.executeProgram(phase1Program); planner.clearRules(); planner.executeProgram(phase2Program); planner.clearRules(); ... RelNode optimized = planner.buildFinalPlan();

}

@Override public RelNode findBestExp() {
if (isLargePlanMode()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for this choice?

Copy link
Contributor Author

@zhuwenzhuang zhuwenzhuang Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I through large planner mode preferred running multiple programs. So the main program wasn't necessary in this mode.
Considering that the optimization can be used for plans of any scale, there is another choice.
findBestExp uses mainProgram do optimization, and if mainProgram is empty(), it just skips the mainProgram, directly builds the final plan.
I will remove this exception.

iter = getGraphIterator(programState, newVertex);
} else {
// Continue from newVertex and keep previous iterator status.
// It prevents revisiting the large plan's stable subgraph from root.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a stable subgraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stable subgraph is a part of the DAG to which no rules will be applied.

For a plan like this, every node replacement in subgraph2 may reset the iterator to the root, so subgraph1, although stable, will be visited repeatedly.

root
<- subgraph1_root (stable)
<- ... other nodes in subgraph1 (stable)
<- subgraph2_root
<- ... other nodes in subgraph2 (with many node replacements)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add this information to a comment, it seems useful

assert start == root;
collectGarbage();
if (!isLargePlanMode()) {
// NOTE: Planner already runs GC for every subtree removed by transformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this comment is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to:

We do not need to run garbage collection for the whole graph here. tryCleanVertices already cleans up potentially removed vertices.


HepRelVertex newVertex = addRelToGraph(bestRel);
HepRelVertex newVertex = addRelToGraph(bestRel, null);
Set<HepRelVertex> garbageVertexSet = new LinkedHashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it have to be a LinkedHashSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedHashSet preserves insertion order during iteration. it is debugging-friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably add a comment about this too

return rel;
}

/** Try to remove discarded vertices recursively. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recursively" means that vertices without in-edges are removed, and after each removal the operation is repeated on the remaining edges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
transformTo will remove the in-edges for the matched nodes. We just tryClean the nodes on this subtree.
It requires that users should not just change the nodes outside the matchOperands list.

if (graph.getOutwardEdges(vertex).size()
!= Sets.newHashSet(requireNonNull(vertex, "vertex").getCurrentRel().getInputs()).size()) {
throw new AssertionError("HepPlanner:outward edge num is different "
+ "with input node num, " + vertex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different from the input node count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

if (liveNum == validSet.size()) {
return;
}
throw new AssertionError("HepPlanner:Query graph live node num is different with root"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

@mihaibudiu
Copy link
Contributor

If you make changes, please use new commits for now.

- Add Javadoc for largePlanMode field with "To be removed in the future" note
- Rename clear() to clearRules() for clarity in multiphase optimization
- Update usage example to use clearRules() and fix "graph is reused" to "graph is preserved"
- Remove UnsupportedOperationException in findBestExp() for large plan mode
- Fix comment about garbage collection in getGraphIterator()
- Add comment about caching fired rule before constructing HepRuleCall
- Fix "different with" to "different from" in assertion error messages
@sonarqubecloud
Copy link

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have approved, but please consider adding the two extra comments I have suggested.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants