[CALCITE-7422] Support large plan optimization mode for HepPlanner by zhuwenzhuang · Pull Request #4803 · apache/calcite

zhuwenzhuang · 2026-02-24T10:52:24Z

Motivation and details: https://issues.apache.org/jira/browse/CALCITE-7422
Before:
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 9s
LargePlanBenchmark:10000 : pretty slow, cannot measure.

CPU Profiler Result(enable fired rules cache and disable large plan mode):
fireRule(RelOptRuleCall ruleCall) takes 2% cpu time.

Mem Profiler Result(enable fired rules cache and disable large plan mode):

After (enable fired rules cache and large plan mode):
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 2s
LargePlanBenchmark:10000 : about 60s

Benchmark (unionNum) Mode Cnt Score Error Units LargePlanBenchmark.testLargeUnionPlan 100 avgt 256.561 ms/op LargePlanBenchmark.testLargeUnionPlan 1000 avgt 1616.421 ms/op LargePlanBenchmark.testLargeUnionPlan 10000 avgt 53393.727 ms/op
CPU Profiler Result:
fireRule(RelOptRuleCall ruleCall) takes 11% cpu time. There is still lots of room for CPU optimization.

Mem Profiler Result:
(avoid buildListRecurse/collectGarbage, smaller memory peak size)

zhuwenzhuang · 2026-02-25T11:59:24Z

A better iterater implementation of DFS/BFS is needed. I will optimize this later (unavailable for the next two weeks).

zhuwenzhuang · 2026-03-12T09:47:57Z

After default depth first iterator replaced by HepVertexIterator.

LargePlanBenchmark:10000 takes 3.6s [10000 union (40000 rel nodes), large plan mode + rule cache + ARBITRARY match order]:

CPU Profiler Result
1.1.fireRule(RelOptRuleCall ruleCall) takes 70 % CPU.

1.2. garbageCollection's removeEdge(V source, V target) takes 23 % CPU.

Memory Profiler Result:
Primarily used by rules, with rare use by the planner/iterator itself.

LargePlanBenchmark:100000 takes 68.741 s [400k rel nodes, same configuration]

All perf result of LargePlanBenchmark:
NOTE:NodeCount/RuleTransforms are estimation values from the test' scale.

MatchOrder	UnionNum	NodeCount	RuleTransforms	Time (ms)
ARBITRARY	1,000	4,000	6,006	1,043
ARBITRARY	3,000	12,000	18,006	1,306
ARBITRARY	10,000	40,000	60,006	3,655
ARBITRARY	30,000	120,000	180,006	13,040
DEPTH_FIRST	1,000	4,000	6,006	347
DEPTH_FIRST	3,000	12,000	18,006	1,068
DEPTH_FIRST	10,000	40,000	60,006	4,165
DEPTH_FIRST	30,000	120,000	180,006	12,898
BOTTOM_UP	1,000	4,000	6,006	1,145
BOTTOM_UP	3,000	12,000	18,006	10,152
TOP_DOWN	1,000	4,000	6,006	1,193
TOP_DOWN	3,000	12,000	18,006	8,074

Key optimizations of large plan mode: 1. Reusable graph, avoid reinit. 2. Efficient traversal, skip stable subtree. 3. Fine-grained GC. Usage: see comments of HepPlanner() Perf result of LargePlanBenchmark: Match Order Union Num Node Count Rule Transforms Time (ms) -------------------------------------------------------------------- ARBITRARY 1000 4000 6006 1043 ARBITRARY 3000 12000 18006 1306 ARBITRARY 10000 40000 60006 3655 ARBITRARY 30000 120000 180006 13040 DEPTH_FIRST 1000 4000 6006 347 DEPTH_FIRST 3000 12000 18006 1068 DEPTH_FIRST 10000 40000 60006 4165 DEPTH_FIRST 30000 120000 180006 12898 BOTTOM_UP 1000 4000 6006 1145 BOTTOM_UP 3000 12000 18006 10152 TOP_DOWN 1000 4000 6006 1193 TOP_DOWN 3000 12000 18006 8074

mihaibudiu · 2026-03-17T23:19:28Z