Missed optimization: NULL contradiction disjunct adds Apache AGE filter rows
I found a missed optimization opportunity: adding the semantically redundant predicate base OR (expr IS NULL AND expr IS NOT NULL) changes the execution plan and raises PROFILE rows, although the base and transformed queries return the same results.
Apache AGE Version: Apache AGE 1.7.0 on PostgreSQL 18.1, Docker image apache/age:latest, digest sha256:4241e2d8bb86a6b2ea44e9ad06c73856e12b209de295124603a599dd7feb70eb
Operating System: macOS Darwin 25.3.0 arm64
Installation Method: Docker image apache/age:latest, port 127.0.0.1:5455 -> 5432
API/Driver: psycopg Python client through Apache AGE cypher(graph, query) wrapper
Steps to reproduce
- Load the attached graph setup file into an empty disposable database:
BUG003_reproducer_graph.cypher.txt
# Execute each Cypher statement from BUG003_reproducer_graph.cypher.txt through Apache AGE's cypher(graph, query) wrapper.
- Verify the graph size:
MATCH (n) RETURN count(n) AS nodes;
-- nodes: 1280
MATCH ()-[r]->() RETURN count(r) AS relationships;
-- relationships: 2450
- Run the base query with the native profile command:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
MATCH (n0 :L2)-[r0 :T3]->(n1)<-[r1 :T4]-(n2 :L1), (n4 :L0)<-[r3 :T1]-(n5 :L1), (n6 :L3)-[r4 :T1]->(n7 :L3)-[r5 :T2]->(n3)
WHERE ((((((((((true AND ((r0.id) <> (r1.id))) AND ((r0.id) <> (r3.id))) AND ((r0.id) <> (r4.id))) AND ((r0.id) <> (r5.id))) AND ((r1.id) <> (r3.id))) AND ((r1.id) <> (r4.id))) AND ((r1.id) <> (r5.id))) AND ((r3.id) <> (r4.id))) AND ((r3.id) <> (r5.id))) AND ((r4.id) <> (r5.id)))
UNWIND [-2082148190, -1254759639, (r1.k62)] AS a0
RETURN (r0.k55) AS a1
ORDER BY a1 DESC;
- Run the transformed query with the native profile command:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
MATCH (n0 :L2)-[r0 :T3]->(n1)<-[r1 :T4]-(n2 :L1), (n4 :L0)<-[r3 :T1]-(n5 :L1), (n6 :L3)-[r4 :T1]->(n7 :L3)-[r5 :T2]->(n3)
WHERE (((((((((((true AND ((r0.id) <> (r1.id))) AND ((r0.id) <> (r3.id))) AND ((r0.id) <> (r4.id))) AND ((r0.id) <> (r5.id))) AND ((r1.id) <> (r3.id))) AND ((r1.id) <> (r4.id))) AND ((r1.id) <> (r5.id))) AND ((r3.id) <> (r4.id))) AND ((r3.id) <> (r5.id))) AND ((r4.id) <> (r5.id)))) OR (((n0.id IS NULL OR n0.id IS NOT NULL)) IS NULL AND ((n0.id IS NULL OR n0.id IS NOT NULL)) IS NOT NULL)
UNWIND [-2082148190, -1254759639, (r1.k62)] AS a0
RETURN (r0.k55) AS a1
ORDER BY a1 DESC;
The transformed query is built by rewriting the target WHERE predicate as:
base OR (expr IS NULL AND expr IS NOT NULL)
where base is the original predicate:
((((((((((true AND ((r0.id) <> (r1.id))) AND ((r0.id) <> (r3.id))) AND ((r0.id) <> (r4.id))) AND ((r0.id) <> (r5.id))) AND ((r1.id) <> (r3.id))) AND ((r1.id) <> (r4.id))) AND ((r1.id) <> (r5.id))) AND ((r3.id) <> (r4.id))) AND ((r3.id) <> (r5.id))) AND ((r4.id) <> (r5.id)))
and expr is:
(n0.id IS NULL OR n0.id IS NOT NULL)
The injected boolean item is always false because the same value cannot be both NULL and NOT NULL. The transformed predicate reduces to base OR false, so it is equivalent to the original base predicate.
Expected behavior
The planner should simplify the redundant boolean item before plan construction because the transformed predicate is semantically equivalent to the base predicate. It should avoid retaining predicate residue or introducing extra filters, joins, barriers, scans, row production, memory, or buffer work.
Actual behavior
Both queries return the same result set on the attached graph.
Primary PROFILE metric: rows changed from 1405.83 to 3032.0; ratio 2.156, delta 1626.17.
Base PROFILE excerpt:
Sort (cost=21282514.73..21304326.56 rows=8724732 width=32) (actual time=355.785..356.895 rows=12000.00 loops=1)
Sort Key: (agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause.r0, '"k55"'::agtype])) DESC
Sort Method: quicksort Memory: 385kB
Buffers: shared hit=5783, temp read=921 written=936
-> Subquery Scan on _age_default_alias_previous_cypher_clause (cost=53984.43..19859203.99 rows=8724732 width=32) (actual time=259.301..356.092 rows=12000.00 loops=1)
Buffers: shared hit=5783, temp read=921 written=936
-> Gather (cost=53984.43..19750144.84 rows=8724732 width=448) (actual time=259.299..349.231 rows=12000.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5783, temp read=921 written=936
-> ProjectSet (cost=52984.43..18876671.64 rows=3635305000 width=448) (actual time=250.327..337.343 rows=4000.00 loops=3)
Buffers: shared hit=5783, temp read=921 written=936
-> Parallel Hash Join (cost=52984.43..618352.28 rows=3635305 width=112) (actual time=250.219..333.744 rows=1333.33 loops=3)
Hash Cond: (r0.end_id = n1.id)
Buffers: shared hit=5541, temp read=921 written=936
-> Hash Join (cost=52829.56..564932.27 rows=100967 width=112) (actual time=29.072..112.428 rows=1333.33 loops=3)
Hash Cond: (r3.end_id = n4.id)
Buffers: shared hit=5513, temp read=921 written=936
-> Hash Join (cost=52792.56..564629.29 rows=100967 width=120) (actual time=28.989..111.952 rows=9333.33 loops=3)
Hash Cond: (r3.start_id = n5.id)
Buffers: shared hit=5504, temp read=921 written=936
-> Nested Loop (cost=52755.56..564326.32 rows=100967 width=128) (actual time=28.956..109.908 rows=46533.33 loops=3)
Join Filter: (((agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype])))::agtype)
Rows Removed by Join Filter: 133
Buffers: shared hit=5492, temp read=921 written=936
-> Hash Join (cost=52755.56..53174.82 rows=8431 width=176) (actual time=28.940..29.004 rows=133.33 loops=3)
Hash Cond: (n3.id = r5.end_id)
Buffers: shared hit=5471, temp read=921 written=936
-> Parallel Append (cost=0.00..117.36 rows=3001 width=8) (actual time=0.003..0.049 rows=426.67 loops=3)
Buffers: shared hit=28
-> Parallel Seq Scan on "L2" n3_2 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.008 rows=70.00 loops=3)
Buffers: shared hit=6
-> Parallel Seq Scan on "L0" n3_3 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.010 rows=170.00 loops=1)
Buffers: shared hit=3
-> Parallel Seq Scan on "L4" n3_4 (cost=0.00..17.06 rows=706 width=8) (actual time=0.004..0.018 rows=310.00 loops=1)
Buffers: shared hit=7
-> Parallel Seq Scan on "GdsmithNode" n3_5 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.008 rows=150.00 loops=1)
Buffers: shared hit=2
-> Parallel Seq Scan on "L3" n3_6 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.017 rows=240.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L1" n3_7 (cost=0.00..17.06 rows=706 width=8) (actual time=0.001..0.008 rows=100.00 loops=2)
Buffers: shared hit=4
-> Parallel Seq Scan on _ag_label_vertex n3_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.001..0.001 rows=0.00 loops=1)
-> Hash (cost=52748.54..52748.54 rows=562 width=184) (actual time=28.902..28.906 rows=400.00 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 244kB
Buffers: shared hit=5443, temp read=921 written=936
-> Nested Loop (cost=50055.55..52748.54 rows=562 width=184) (actual time=27.366..28.852 rows=400.00 loops=3)
Buffers: shared hit=5443, temp read=921 written=936
-> Merge Join (cost=50055.40..52625.19 rows=562 width=184) (actual time=26.637..28.438 rows=1200.00 loops=3)
Merge Cond: (r1.end_id = r0.end_id)
Join Filter: (_ag_enforce_edge_uniqueness2(r0.id, r1.id) AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype)
Buffers: shared hit=635, temp read=921 written=936
-> Index Scan using "T4_end_id_idx" on "T4" r1 (cost=0.15..62.70 rows=970 width=56) (actual time=0.011..0.051 rows=310.00 loops=3)
Index Searches: 3
Buffers: shared hit=299
-> Sort (cost=50055.25..50078.73 rows=9390 width=128) (actual time=26.492..26.632 rows=3200.00 loops=3)
Sort Key: r0.end_id
Sort Method: quicksort Memory: 1436kB
Buffers: shared hit=336, temp read=921 written=936
Worker 0: Sort Method: quicksort Memory: 1436kB
Worker 1: Sort Method: quicksort Memory: 1436kB
-> Hash Join (cost=34031.33..49435.66 rows=9390 width=128) (actual time=20.232..26.074 rows=3200.00 loops=3)
Hash Cond: (r4.start_id = n6.id)
Buffers: shared hit=330, temp read=921 written=936
-> Hash Join (cost=33994.33..49373.92 rows=9390 width=136) (actual time=20.053..25.745 rows=6400.00 loops=3)
Hash Cond: (r4.end_id = n7.id)
Join Filter: (_ag_enforce_edge_uniqueness2(r4.id, r5.id) AND ((agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype)
Buffers: shared hit=312, temp read=921 written=936
-> Seq Scan on "T1" r4 (cost=0.00..19.70 rows=970 width=56) (actual time=0.004..0.025 rows=350.00 loops=3)
Buffers: shared hit=21
-> Hash (cost=24559.92..24559.92 rows=313633 width=120) (actual time=19.158..19.161 rows=7200.00 loops=3)
Buckets: 65536 Batches: 8 Memory Usage: 2519kB
Buffers: shared hit=229, temp written=912
-> Hash Join (cost=37.43..24559.92 rows=313633 width=120) (actual time=0.130..17.149 rows=7200.00 loops=3)
Hash Cond: (r0.start_id = n0.id)
Buffers: shared hit=229
-> Nested Loop (cost=0.43..23696.73 rows=313633 width=120) (actual time=0.091..15.741 rows=31500.00 loops=3)
Join Filter: ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype
Buffers: shared hit=211
-> Merge Join (cost=0.43..152.10 rows=970 width=64) (actual time=0.073..0.149 rows=90.00 loops=3)
Merge Cond: (n7.id = r5.start_id)
Buffers: shared hit=187
-> Index Only Scan using "L3_pkey" on "L3" n7 (cost=0.15..66.15 rows=1200 width=8) (actual time=0.014..0.039 rows=240.00 loops=3)
Heap Fetches: 720
Index Searches: 3
Buffers: shared hit=23
-> Index Scan using "T2_start_id_idx" on "T2" r5 (cost=0.28..70.83 rows=970 width=56) (actual time=0.006..0.070 rows=381.00 loops=3)
Index Searches: 3
Buffers: shared hit=164
-> Materialize (cost=0.00..24.55 rows=970 width=56) (actual time=0.000..0.008 rows=350.00 loops=270)
Storage: Memory Maximum Storage: 75kB
Buffers: shared hit=24
-> Seq Scan on "T3" r0 (cost=0.00..19.70 rows=970 width=56) (actual time=0.004..0.024 rows=350.00 loops=3)
Buffers: shared hit=24
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.024..0.024 rows=210.00 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 25kB
Buffers: shared hit=18
-> Seq Scan on "L2" n0 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.004..0.013 rows=210.00 loops=3)
Buffers: shared hit=18
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.026..0.026 rows=240.00 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 26kB
Buffers: shared hit=18
-> Seq Scan on "L3" n6 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.005..0.015 rows=240.00 loops=3)
Buffers: shared hit=18
-> Index Only Scan using "L1_pkey" on "L1" n2 (cost=0.15..0.22 rows=1 width=8) (actual time=0.000..0.000 rows=0.33 loops=3600)
Index Cond: (id = r1.start_id)
Heap Fetches: 1200
Index Searches: 3600
Buffers: shared hit=4808
-> Materialize (cost=0.00..24.55 rows=970 width=48) (actual time=0.000..0.011 rows=350.00 loops=400)
Storage: Memory Maximum Storage: 66kB
Buffers: shared hit=21
-> Seq Scan on "T1" r3 (cost=0.00..19.70 rows=970 width=48) (actual time=0.005..0.022 rows=350.00 loops=3)
Buffers: shared hit=21
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.025..0.025 rows=200.00 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=12
-> Seq Scan on "L1" n5 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.006..0.016 rows=200.00 loops=3)
Buffers: shared hit=12
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.034..0.035 rows=170.00 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 23kB
Buffers: shared hit=9
-> Seq Scan on "L0" n4 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.017..0.025 rows=170.00 loops=3)
Buffers: shared hit=9
-> Parallel Hash (cost=117.36..117.36 rows=3001 width=8) (actual time=221.089..221.091 rows=426.67 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 192kB
Buffers: shared hit=28
-> Parallel Append (cost=0.00..117.36 rows=3001 width=8) (actual time=208.615..208.664 rows=426.67 loops=3)
Buffers: shared hit=28
-> Parallel Seq Scan on "L2" n1_2 (cost=0.00..17.06 rows=706 width=8) (actual time=106.891..106.899 rows=105.00 loops=2)
Buffers: shared hit=6
-> Parallel Seq Scan on "L0" n1_3 (cost=0.00..17.06 rows=706 width=8) (actual time=108.276..108.281 rows=85.00 loops=2)
Buffers: shared hit=3
-> Parallel Seq Scan on "L4" n1_4 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.017 rows=310.00 loops=1)
Buffers: shared hit=7
-> Parallel Seq Scan on "GdsmithNode" n1_5 (cost=0.00..17.06 rows=706 width=8) (actual time=0.005..0.011 rows=150.00 loops=1)
Buffers: shared hit=2
-> Parallel Seq Scan on "L3" n1_6 (cost=0.00..17.06 rows=706 width=8) (actual time=0.005..0.018 rows=240.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L1" n1_7 (cost=0.00..17.06 rows=706 width=8) (actual time=195.514..195.529 rows=200.00 loops=1)
Buffers: shared hit=4
-> Parallel Seq Scan on _ag_label_vertex n1_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.001..0.002 rows=0.00 loops=1)
Planning:
Buffers: shared hit=24
Planning Time: 22.477 ms
JIT:
Functions: 359
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 6.236 ms (Deform 2.235 ms), Inlining 72.404 ms, Optimization 282.046 ms, Emission 271.598 ms, Total 632.284 ms
Execution Time: 358.945 ms
Transformed PROFILE excerpt:
Sort (cost=6360501050.93..6366923957.41 rows=2569162591 width=32) (actual time=2228.241..2228.717 rows=12000.00 loops=1)
Sort Key: (agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause.r0, '"k55"'::agtype])) DESC
Sort Method: quicksort Memory: 385kB
Buffers: shared hit=2780, temp written=25773
-> Subquery Scan on _age_default_alias_previous_cypher_clause (cost=28586766.50..5713081376.33 rows=2569162591 width=32) (actual time=2207.761..2227.782 rows=12000.00 loops=1)
Buffers: shared hit=2780, temp written=25773
-> Gather (cost=28586766.50..5680966843.95 rows=2569162591 width=448) (actual time=2207.759..2221.255 rows=12000.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2780, temp written=25773
-> ProjectSet (cost=28585766.50..5424049584.85 rows=1070484413000 width=448) (actual time=886.021..890.234 rows=4000.00 loops=3)
Buffers: shared hit=2780, temp written=25773
-> Hash Join (cost=28585766.50..47541620.55 rows=1070484413 width=112) (actual time=885.997..887.013 rows=1333.33 loops=3)
Hash Cond: (r4.start_id = n6.id)
Buffers: shared hit=2780, temp written=25773
-> Parallel Hash Join (cost=28585729.50..44721637.69 rows=1070484413 width=120) (actual time=668.595..669.440 rows=2666.67 loops=3)
Hash Cond: (r5.end_id = n3.id)
Buffers: shared hit=2762, temp written=25773
-> Merge Join (cost=28585574.63..29036604.14 rows=29731549 width=128) (actual time=668.478..669.012 rows=2666.67 loops=3)
Merge Cond: (n1.id = r0.end_id)
Buffers: shared hit=2734, temp written=25773
-> Sort (cost=290.68..298.19 rows=3001 width=8) (actual time=0.074..0.084 rows=278.67 loops=3)
Sort Key: n1.id
Sort Method: quicksort Memory: 49kB
Buffers: shared hit=34
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Append (cost=0.00..117.36 rows=3001 width=8) (actual time=0.001..0.034 rows=426.67 loops=3)
Buffers: shared hit=28
-> Parallel Seq Scan on "L2" n1_2 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.009 rows=210.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L0" n1_3 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.007 rows=170.00 loops=1)
Buffers: shared hit=3
-> Parallel Seq Scan on "L4" n1_4 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.012 rows=310.00 loops=1)
Buffers: shared hit=7
-> Parallel Seq Scan on "GdsmithNode" n1_5 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.006 rows=150.00 loops=1)
Buffers: shared hit=2
-> Parallel Seq Scan on "L3" n1_6 (cost=0.00..17.06 rows=706 width=8) (actual time=0.001..0.010 rows=240.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L1" n1_7 (cost=0.00..17.06 rows=706 width=8) (actual time=0.002..0.009 rows=200.00 loops=1)
Buffers: shared hit=4
-> Parallel Seq Scan on _ag_label_vertex n1_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.000..0.000 rows=0.00 loops=1)
-> Materialize (cost=28585283.95..28595193.09 rows=1981828 width=128) (actual time=2005.130..2006.106 rows=8000.00 loops=1)
Storage: Memory Maximum Storage: 292kB
Buffers: shared hit=2700, temp written=25773
-> Sort (cost=28585283.95..28590238.52 rows=1981828 width=128) (actual time=2005.128..2005.334 rows=8000.00 loops=1)
Sort Key: r0.end_id
Sort Method: quicksort Memory: 2880kB
Buffers: shared hit=2700, temp written=25773
-> Hash Join (cost=219.68..28120589.59 rows=1981828 width=128) (actual time=1061.307..2004.429 rows=8000.00 loops=1)
Hash Cond: (r3.start_id = n5.id)
Buffers: shared hit=2700, temp written=25773
-> Hash Join (cost=182.68..28115331.91 rows=1981828 width=136) (actual time=1060.058..2002.860 rows=40000.00 loops=1)
Hash Cond: (r1.start_id = n2.id)
Buffers: shared hit=2696, temp written=25773
-> Hash Join (cost=145.68..28110074.24 rows=1981828 width=136) (actual time=518.268..1997.557 rows=120000.00 loops=1)
Hash Cond: (r0.start_id = n0.id)
Join Filter: ((((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r0.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r1.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r3.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype AND ((agtype_access_operator(VARIADIC ARRAY[r4.properties, '"id"'::agtype]) <> agtype_access_operator(VARIADIC ARRAY[r5.properties, '"id"'::agtype])))::agtype) OR ((((agtype_access_operator(VARIADIC ARRAY[n0.properties, '"id"'::agtype]) IS NULL) OR (agtype_access_operator(VARIADIC ARRAY[n0.properties, '"id"'::agtype]) IS NOT NULL)) IS NULL) AND (((agtype_access_operator(VARIADIC ARRAY[n0.properties, '"id"'::agtype]) IS NULL) OR (agtype_access_operator(VARIADIC ARRAY[n0.properties, '"id"'::agtype]) IS NOT NULL)) IS NOT NULL)))
Buffers: shared hit=2692, temp written=25773
-> Merge Join (cost=108.68..27067637.46 rows=397012715 width=232) (actual time=407.044..1508.330 rows=440000.00 loops=1)
Merge Cond: (r1.end_id = r0.end_id)
Join Filter: _ag_enforce_edge_uniqueness2(r0.id, r1.id)
Buffers: shared hit=2686, temp written=25773
-> Index Scan using "T4_end_id_idx" on "T4" r1 (cost=0.15..62.70 rows=970 width=56) (actual time=0.010..0.145 rows=310.00 loops=1)
Index Searches: 1
Buffers: shared hit=99
-> Materialize (cost=108.53..6838339.95 rows=245574900 width=176) (actual time=0.181..1435.236 rows=1400000.00 loops=1)
Storage: Disk Maximum Storage: 206184kB
Buffers: shared hit=2587, temp written=25773
-> Nested Loop (cost=108.53..6224402.70 rows=245574900 width=176) (actual time=0.180..1109.826 rows=1400000.00 loops=1)
Buffers: shared hit=2587
-> Index Scan using "T3_end_id_idx" on "T3" r0 (cost=0.15..62.70 rows=970 width=56) (actual time=0.005..0.190 rows=350.00 loops=1)
Index Searches: 1
Buffers: shared hit=111
-> Hash Join (cost=108.38..3993.42 rows=253170 width=120) (actual time=0.026..2.991 rows=4000.00 loops=350)
Hash Cond: (r3.end_id = n4.id)
Buffers: shared hit=2476
-> Nested Loop (cost=71.38..3289.50 rows=253170 width=128) (actual time=0.001..2.154 rows=28000.00 loops=350)
Buffers: shared hit=2473
-> Seq Scan on "T1" r3 (cost=0.00..19.70 rows=970 width=48) (actual time=0.001..0.013 rows=350.00 loops=350)
Buffers: shared hit=2450
-> Materialize (cost=71.38..105.82 rows=261 width=80) (actual time=0.000..0.002 rows=80.00 loops=122500)
Storage: Memory Maximum Storage: 40kB
Buffers: shared hit=23
-> Hash Join (cost=71.38..104.52 rows=261 width=80) (actual time=0.087..0.117 rows=80.00 loops=1)
Hash Cond: (r4.end_id = n7.id)
Join Filter: _ag_enforce_edge_uniqueness2(r4.id, r5.id)
Buffers: shared hit=23
-> Seq Scan on "T1" r4 (cost=0.00..19.70 rows=970 width=56) (actual time=0.001..0.013 rows=350.00 loops=1)
Buffers: shared hit=7
-> Hash (cost=59.26..59.26 rows=970 width=64) (actual time=0.081..0.082 rows=90.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 27kB
Buffers: shared hit=16
-> Hash Join (cost=37.00..59.26 rows=970 width=64) (actual time=0.039..0.074 rows=90.00 loops=1)
Hash Cond: (r5.start_id = n7.id)
Buffers: shared hit=16
-> Seq Scan on "T2" r5 (cost=0.00..19.70 rows=970 width=56) (actual time=0.003..0.021 rows=410.00 loops=1)
Buffers: shared hit=10
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.031..0.032 rows=240.00 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 26kB
Buffers: shared hit=6
-> Seq Scan on "L3" n7 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.011..0.021 rows=240.00 loops=1)
Buffers: shared hit=6
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.018..0.018 rows=170.00 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 23kB
Buffers: shared hit=3
-> Seq Scan on "L0" n4 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.004..0.010 rows=170.00 loops=1)
Buffers: shared hit=3
-> Hash (cost=22.00..22.00 rows=1200 width=40) (actual time=0.018..0.019 rows=210.00 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 60kB
Buffers: shared hit=6
-> Seq Scan on "L2" n0 (cost=0.00..22.00 rows=1200 width=40) (actual time=0.002..0.008 rows=210.00 loops=1)
Buffers: shared hit=6
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.017..0.017 rows=200.00 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=4
-> Seq Scan on "L1" n2 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.002..0.008 rows=200.00 loops=1)
Buffers: shared hit=4
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=0.022..0.023 rows=200.00 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=4
-> Seq Scan on "L1" n5 (cost=0.00..22.00 rows=1200 width=8) (actual time=0.005..0.013 rows=200.00 loops=1)
Buffers: shared hit=4
-> Parallel Hash (cost=117.36..117.36 rows=3001 width=8) (actual time=0.074..0.076 rows=426.67 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 128kB
Buffers: shared hit=28
-> Parallel Append (cost=0.00..117.36 rows=3001 width=8) (actual time=0.009..0.126 rows=1280.00 loops=1)
Buffers: shared hit=28
-> Parallel Seq Scan on "L2" n3_2 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.012 rows=210.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L0" n3_3 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.009 rows=170.00 loops=1)
Buffers: shared hit=3
-> Parallel Seq Scan on "L4" n3_4 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.016 rows=310.00 loops=1)
Buffers: shared hit=7
-> Parallel Seq Scan on "GdsmithNode" n3_5 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.008 rows=150.00 loops=1)
Buffers: shared hit=2
-> Parallel Seq Scan on "L3" n3_6 (cost=0.00..17.06 rows=706 width=8) (actual time=0.003..0.011 rows=240.00 loops=1)
Buffers: shared hit=6
-> Parallel Seq Scan on "L1" n3_7 (cost=0.00..17.06 rows=706 width=8) (actual time=0.006..0.016 rows=200.00 loops=1)
Buffers: shared hit=4
-> Parallel Seq Scan on _ag_label_vertex n3_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.002..0.002 rows=0.00 loops=1)
-> Hash (cost=22.00..22.00 rows=1200 width=8) (actual time=217.393..217.393 rows=240.00 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 26kB
Buffers: shared hit=18
-> Seq Scan on "L3" n6 (cost=0.00..22.00 rows=1200 width=8) (actual time=217.352..217.375 rows=240.00 loops=3)
Buffers: shared hit=18
Planning:
Buffers: shared hit=6
Planning Time: 20.595 ms
JIT:
Functions: 383
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 5.759 ms (Deform 2.084 ms), Inlining 69.621 ms, Optimization 290.668 ms, Emission 291.912 ms, Total 657.961 ms
Execution Time: 2238.285 ms
Missed optimization summary: the planner keeps the NULL-contradiction branch in a filter instead of reducing it to false, raising rows from 1405.83 to 3032.0.
Missed optimization: NULL contradiction disjunct adds Apache AGE filter rows
I found a missed optimization opportunity: adding the semantically redundant predicate
base OR (expr IS NULL AND expr IS NOT NULL)changes the execution plan and raises PROFILE rows, although the base and transformed queries return the same results.Apache AGE Version: Apache AGE 1.7.0 on PostgreSQL 18.1, Docker image
apache/age:latest, digestsha256:4241e2d8bb86a6b2ea44e9ad06c73856e12b209de295124603a599dd7feb70ebOperating System: macOS Darwin 25.3.0 arm64
Installation Method: Docker image
apache/age:latest, port127.0.0.1:5455 -> 5432API/Driver: psycopg Python client through Apache AGE
cypher(graph, query)wrapperSteps to reproduce
BUG003_reproducer_graph.cypher.txt
# Execute each Cypher statement from BUG003_reproducer_graph.cypher.txt through Apache AGE's cypher(graph, query) wrapper.The transformed query is built by rewriting the target
WHEREpredicate as:where
baseis the original predicate:and
expris:The injected boolean item is always false because the same value cannot be both
NULLandNOT NULL. The transformed predicate reduces tobase OR false, so it is equivalent to the original base predicate.Expected behavior
The planner should simplify the redundant boolean item before plan construction because the transformed predicate is semantically equivalent to the base predicate. It should avoid retaining predicate residue or introducing extra filters, joins, barriers, scans, row production, memory, or buffer work.
Actual behavior
Both queries return the same result set on the attached graph.
Primary PROFILE metric:
rowschanged from1405.83to3032.0; ratio2.156, delta1626.17.Base PROFILE excerpt:
Transformed PROFILE excerpt:
Missed optimization summary: the planner keeps the NULL-contradiction branch in a filter instead of reducing it to
false, raisingrowsfrom1405.83to3032.0.