Skip to content

Add query EXPLAIN capture to the database adapters#883

Open
premtsd-code wants to merge 14 commits into
mainfrom
feat/explain
Open

Add query EXPLAIN capture to the database adapters#883
premtsd-code wants to merge 14 commits into
mainfrom
feat/explain

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Jun 1, 2026

What

Adds an opt-in query-plan ("explain") capability to the database adapters. A caller wraps a read in Database::withExplain(callable); every find/count/sum issued inside the scope captures a normalized query plan, and the call returns a Document with one entry per physical query.

Design

A capture-buffer on Adapter, not a standalone explain() method:

  • startExplainCapture() / stopExplainCapture() / isExplainCapturing() toggle a nullable buffer (nesting throws).
  • Each find/count/sum checks explainBuffer !== null and calls capturePlan(), which runs an engine-native EXPLAIN and normalizes it.
  • Pool::delegate() forwards the scope to the inner adapter and aggregates entries back, covering the pinned-transaction and pooled paths.

The common (capture-off) path pays only a single null check.

Normalized plan

Every adapter returns the same fixed shape so a consumer DTO can stay typed:

engine, rowsScanned, indexUsed, estimatedCost, rowsReturned, executionTime, and the raw tree.

  • Real execution stats (rowsReturned, executionTime) are measured from the read that already runs inside the explain scope — no second EXPLAIN ANALYZE pass.
  • Precise engine label per adapter (mysql / mariadb / postgres / sqlite / mongo) via SQL::getExplainEngine().
  • Postgres recurses into child Plans so an index scan under a Limit (e.g. pgvector ORDER BY emb <-> $1 LIMIT k) is reported instead of just the Limit node.
  • Mongo reads nReturned / executionTimeMillis straight from executionStats.
  • Internal storage identifiers (_uid, _perms, __metadata, …) are stripped from the plan for display.

Tests

  • Unit: capture toggling, sanitizer, recordPlanActuals (fill / error-skip / no-op), Postgres child-plan recursion.
  • E2E: per-engine plan shape, count/sum capture, pooled transactions, nested-scope guard.

Summary by CodeRabbit

  • New Features

    • Scoped explain capture across SQL and NoSQL engines (MySQL, MariaDB, Postgres, SQLite, MongoDB) to collect normalized, sanitized execution plans with estimated stats, actual execution time and row counts; captures are safe to serialize and cannot be nested.
    • Public helper to run a callback and return captured query plans.
  • Tests

    • New end-to-end and unit tests covering capture, sanitization, parsing, transaction interaction, error handling, and nesting guards.

- Add rowsReturned/executionTime to the normalized plan, measured from the
  real find/count/sum the explain scope already runs (no extra EXPLAIN pass)
- Report a precise per-adapter engine label (mysql/mariadb/postgres/sqlite)
  via SQL::getExplainEngine() instead of a generic 'sql'
- Fix Postgres plan extraction to recurse into child Plans so pgvector
  index scans under a Limit are reported (indexUsed/rowsScanned)
- Capture actual nReturned/executionTimeMillis from Mongo executionStats
- Document the display-only identifier substring rewrite caveat
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Adds explain-capture lifecycle and sanitization to Adapter, implements engine-specific explain execution (SQL, Postgres, MariaDB, MySQL, SQLite, Mongo), wires capture into find/count/sum and Pool, exposes Database::withExplain(), and adds unit and e2e tests.

Changes

Query Explain Capture System

Layer / File(s) Summary
Explain Capture Lifecycle
src/Database/Adapter.php
Introduces $explainBuffer, startExplainCapture()/stopExplainCapture(), capturePlan()/recordPlanActuals(), and sanitizePlan()/renaming utilities.
SQL Adapter Plan Execution
src/Database/Adapter/SQL.php
Adds explainSQL() (EXPLAIN FORMAT=JSON), JSON parsing helpers, metric extractors, and wires capture/timing into find()/count()/sum().
SQL Dialect Implementations
src/Database/Adapter/MySQL.php, src/Database/Adapter/MariaDB.php, src/Database/Adapter/Postgres.php, src/Database/Adapter/SQLite.php
Dialect helpers and getExplainEngine() implementations; Postgres parses JSON plan tree and extracts scanned rows/index; MariaDB strips timeout wrappers; SQLite uses EXPLAIN QUERY PLAN.
MongoDB Explain Integration
src/Database/Adapter/Mongo.php
Captures command payloads for operations and implements MongoDB explain execution + extractors returning winning-plan index and decoded tree.
Pool Adapter Coordination
src/Database/Adapter/Pool.php
Centralizes invocation via shared closure that propagates explain capture for both pinned and pooled paths and blocks Pool::explainSQL().
Public API
src/Database/Database.php
Adds withExplain(callable) that starts/stops capture around a callback and returns captured plans in a Document.
E2E Test Integration + Scopes
tests/e2e/Adapter/Base.php, tests/e2e/Adapter/Scopes/ExplainTests.php
Includes trait in e2e base and adds six e2e tests covering capture, cleanup, pooled-transaction capture, and nesting guard.
Unit Test Coverage
tests/unit/ExplainTest.php
Eight unit tests for lifecycle, sanitizePlan behavior, recordPlanActuals semantics, and Postgres extraction recursion/fallbacks.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Database
  participant Adapter
  participant Engine
  Client->>Database: withExplain(callback)
  Database->>Adapter: startExplainCapture()
  Adapter->>Adapter: init $explainBuffer
  Client->>Adapter: find/count/sum (captured)
  Adapter->>Adapter: capturePlan(payload)
  Adapter->>Engine: EXPLAIN / explain command
  Engine-->>Adapter: explain result
  Adapter->>Adapter: sanitizePlan()
  Adapter->>Adapter: buffer.push(plan)
  Adapter->>Adapter: execute real query
  Adapter->>Adapter: recordPlanActuals(rows, time)
  Adapter->>Adapter: buffer[last].actuals set
  Database->>Adapter: stopExplainCapture()
  Adapter-->>Database: captured queries
  Database-->>Client: Document(queries)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • utopia-php/database#721: Related Mongo adapter work introducing explain handling used by this explain-capture integration.

Suggested reviewers

  • abnegate
  • ArnabChatterjee20k
  • fogelito

Poem

🐰
I nibbled plans in buffered rows,
Renamed hidden columns none oppose.
From Mongo trees to Postgres lanes,
I hop and hide the internal names.
Hooray — queries bloom, sanitized and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add query EXPLAIN capture to the database adapters' accurately summarizes the main objective of the PR—introducing opt-in query-plan capture functionality across database adapters.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/explain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR adds an opt-in scoped explain-capture feature to all supported adapters (MySQL, MariaDB, Postgres, SQLite, MongoDB). Callers wrap reads in Database::withExplain(callable, ?Document &$plan), which returns both the query result and a normalized plan document containing per-query entries with engine label, estimated stats, actual timing, and the raw plan tree.

  • Core buffer management lives on Adapter via startExplainCapture / stopExplainCapture / recordPlanActuals; each adapter overrides explainSQL to produce a normalized map, and SQL adapters hook find/count/sum with a single null-check overhead on the common (non-explain) path.
  • Pool propagation was refactored to forward the explain scope to the inner adapter via an $invoke closure, with a re-entrancy guard (! $adapter->isExplainCapturing()) that prevents double-starting capture on a pinned transaction adapter.
  • MongoDB correctly uses queryPlanner verbosity (not executionStats) so the query is not double-executed; actual timing and row counts come from recordPlanActuals after the real operation runs.

Confidence Score: 5/5

Safe to merge; all three previously-flagged issues (return-value discarded, Pool re-entrancy throw, MongoDB double-execution) have been correctly addressed in this revision.

The implementation is well-structured and the common non-explain path is protected by a single null check. The Pool delegation correctly guards against re-entrant capture starts, MongoDB switched from executionStats to queryPlanner avoiding double query execution, and withExplain now properly returns the callback result. The only remaining item is the SQLite indexUsed field always being null despite available data in the plan steps, which is a minor quality gap rather than a correctness issue.

No files require special attention.

Important Files Changed

Filename Overview
src/Database/Adapter.php Adds the core explainBuffer state machine with start/stop/isCapturing, capturePlan, recordPlanActuals, and sanitizePlan; logic is sound and well-guarded against concurrent nesting.
src/Database/Adapter/SQL.php Adds explain capture points in find, count, and sum; the walkExplainTables depth-limit uses > instead of >=, so a permission-subquery index can surface as indexUsed when the main table has no index (previously flagged P2).
src/Database/Adapter/SQLite.php Implements explainSQL via EXPLAIN QUERY PLAN; always returns null for indexUsed, rowsScanned, and estimatedCost even though the detail column in the returned steps contains index usage information.
src/Database/Adapter/Postgres.php Implements explainSQL with EXPLAIN (FORMAT JSON), recursing into child Plans to extract the leaf scan node's index and row estimates; correctly handles Limit→Index Scan pgvector patterns.
src/Database/Adapter/Mongo.php Uses queryPlanner verbosity (not executionStats) to avoid double-execution; correctly separates plan extraction from actuals timing; winningPlan-only traversal avoids false-positive index reports from rejected plans.
src/Database/Adapter/Pool.php Re-entrancy guard (! $adapter->isExplainCapturing()) correctly prevents double-starting capture on a pinned transaction adapter; captures drain cleanly into the pool buffer in the finally block.
src/Database/Database.php Adds withExplain(callable, ?Document &$plan) that correctly returns the callback result and populates the plan via a by-reference out-parameter; cleanup always runs in finally.
tests/unit/ExplainTest.php Covers buffer toggle, sanitizer, recordPlanActuals (fill / error-skip / no-op), Postgres child-plan recursion, and Pool re-entrancy guard; good unit-level coverage.
tests/e2e/Adapter/Scopes/ExplainTests.php E2E suite covers find, count/sum capture, exception safety, nested-scope guard, and pooled-transaction path; broad engine-level validation.

Reviews (9): Last reviewed commit: "Sanitize embedded perms/metadata table t..." | Re-trigger Greptile

Comment thread src/Database/Database.php Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/ExplainTest.php (1)

98-116: ⚡ Quick win

Exercise the “last entry” contract with more than one buffered plan.

This fixture only has one buffered query, so it still passes if recordPlanActuals() updates the first entry or every entry. Seed two plans and assert only the tail entry gets rowsReturned / executionTime.

Suggested test tightening
-        $buffer->setValue($adapter, [[
-            'purpose' => 'find',
-            'context' => ['collection' => 'movies'],
-            'plan'    => ['engine' => 'mariadb', 'rowsScanned' => 10],
-        ]]);
+        $buffer->setValue($adapter, [
+            [
+                'purpose' => 'find',
+                'context' => ['collection' => 'movies'],
+                'plan'    => ['engine' => 'mariadb', 'rowsScanned' => 10],
+            ],
+            [
+                'purpose' => 'count',
+                'context' => ['collection' => 'movies'],
+                'plan'    => ['engine' => 'mariadb', 'rowsScanned' => 3],
+            ],
+        ]);
@@
-        $this->assertSame(7, $captured[0]['plan']['rowsReturned']);
-        $this->assertSame(1.5, $captured[0]['plan']['executionTime']);
+        $this->assertArrayNotHasKey('rowsReturned', $captured[0]['plan']);
+        $this->assertArrayNotHasKey('executionTime', $captured[0]['plan']);
+        $this->assertSame(7, $captured[1]['plan']['rowsReturned']);
+        $this->assertSame(1.5, $captured[1]['plan']['executionTime']);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/ExplainTest.php` around lines 98 - 116, The test
testRecordPlanActualsFillsLastEntry should seed Adapter::explainBuffer with two
plan entries (instead of one) and then call Adapter::recordPlanActuals(...) to
verify it only updates the tail entry: set two arrays into the
ReflectionProperty('explainBuffer') for Adapter, invoke the
ReflectionMethod('recordPlanActuals') with the rowsReturned and executionTime
values, then assert that the first buffered plan's plan['rowsReturned'] and
plan['executionTime'] remain unchanged while the second (last) plan's fields
equal the provided values; this ensures recordPlanActuals updates only the last
entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 3782-3799: PHPStan cannot prove mutation of $found through the
recursive anonymous closure $walk; replace the closure with a pure-returning
recursive routine so control flow is explicit: either convert the closure into a
named private/static method (e.g., findIndexNameInPlan(array $node): ?string) or
change $walk to accept/return ?string (return the discovered indexName rather
than mutating $found) and then set $found = $walk($tree); ensure references to
$walk, $found and $tree are updated accordingly so the function return type lets
PHPStan infer the mutation.

In `@src/Database/Adapter/Postgres.php`:
- Around line 1590-1602: The explain block currently prepares and runs the
statement directly via $this->getPDO()->prepare(...)->execute(), bypassing
Postgres::execute() and thus skipping statement_timeout and wrapped error
handling; replace the direct prepare/execute flow so the EXPLAIN SQL is executed
through the adapter's execution path (call Postgres::execute() or the adapter's
equivalent method) passing the same $explainSql and $binds (convert double
values with getFloatPrecision where needed before passing binds), then obtain
the result (e.g. fetchColumn from the returned statement/result) and close the
cursor via the adapter-returned statement; update references to $stmt,
getPDO()->prepare, ->execute(), ->fetchColumn() and ->closeCursor() accordingly
so the EXPLAIN inherits the same timeout and error behavior as normal
statements.

In `@tests/e2e/Adapter/Scopes/ExplainTests.php`:
- Around line 84-85: json_encode() can return false on failure, causing a type
error when passed to assertStringNotContainsString; update the test to ensure
encoding succeeded before asserting: call json_encode($entry['plan']['tree'])
into $rawTree, assert that $rawTree is not false (or use
$this->assertNotFalse(json_encode(...))) and/or check json_last_error() for
JSON_ERROR_NONE, then call $this->assertStringNotContainsString('_perms',
(string)$rawTree); reference json_encode(), $rawTree, $entry['plan']['tree'],
and assertStringNotContainsString when making the change.

---

Nitpick comments:
In `@tests/unit/ExplainTest.php`:
- Around line 98-116: The test testRecordPlanActualsFillsLastEntry should seed
Adapter::explainBuffer with two plan entries (instead of one) and then call
Adapter::recordPlanActuals(...) to verify it only updates the tail entry: set
two arrays into the ReflectionProperty('explainBuffer') for Adapter, invoke the
ReflectionMethod('recordPlanActuals') with the rowsReturned and executionTime
values, then assert that the first buffered plan's plan['rowsReturned'] and
plan['executionTime'] remain unchanged while the second (last) plan's fields
equal the provided values; this ensures recordPlanActuals updates only the last
entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e29d09ca-e116-4d51-b618-cdbca58d2ab3

📥 Commits

Reviewing files that changed from the base of the PR and between 5679019 and 939a4d3.

📒 Files selected for processing (12)
  • src/Database/Adapter.php
  • src/Database/Adapter/MariaDB.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/MySQL.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php
  • tests/e2e/Adapter/Base.php
  • tests/e2e/Adapter/Scopes/ExplainTests.php
  • tests/unit/ExplainTest.php

Comment thread src/Database/Adapter/Mongo.php Outdated
Comment thread src/Database/Adapter/Postgres.php Outdated
Comment thread tests/e2e/Adapter/Scopes/ExplainTests.php Outdated
- Mongo explainSQL: guard json_encode result against false before decode
- Rewrite extractMongoIndexUsed as a returning recursion so the index walk
  has no dead null-comparison and is easier to follow
- Guard json_encode in the explain e2e assertion
- Document that withExplain returns the plan, not the callback result
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 3786-3794: The current traversal over $tree in
extractMongoIndexUsed() walks into all nested plans (including
queryPlanner.rejectedPlans) and can pick up IXSCAN stages from non-winning
branches; change the logic so it only follows the winning plan/execution path
instead of a blind foreach of $tree: locate the loop that iterates $tree and
instead walk the queryPlanner.winningPlan and the corresponding
executionStats.executionStages path (avoid traversing rejectedPlans or other
sibling branches) so that extractMongoIndexUsed() reports indexes only from the
winning plan (and therefore will not report IXSCAN when the winning plan is
COLLSCAN).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8a0bba38-80ad-448f-90d6-08cdf3e62c68

📥 Commits

Reviewing files that changed from the base of the PR and between 939a4d3 and 4176cef.

📒 Files selected for processing (3)
  • src/Database/Adapter/Mongo.php
  • src/Database/Database.php
  • tests/e2e/Adapter/Scopes/ExplainTests.php
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/e2e/Adapter/Scopes/ExplainTests.php
  • src/Database/Database.php

Comment thread src/Database/Adapter/Mongo.php Outdated
Drop the DTO field-table duplicated in the explainSQL docblock and the
verbose recordPlanActuals/rename caveats; keep the design rationale and
gotchas.
- withExplain now returns the callback's own result (matching withTenant/
  withPreserveSequence); the plan is exposed via a by-ref out-param, so one
  call yields both rows and plan
- Mongo explain uses queryPlanner verbosity instead of executionStats so it
  no longer executes every captured find/count/sum a second time; real
  rowsReturned/executionTime come from timing the actual read, as in SQL
- Restrict Mongo index extraction to the winning plan so a rejected-plan
  IXSCAN is never reported when the query runs a COLLSCAN
- Route the Postgres EXPLAIN through execute() so it inherits statement_timeout
  and the adapter's PDO error handling
…andling

Align the base SQL and SQLite explainSQL cursor handling with the Postgres
override: run the EXPLAIN via execute() so it inherits the timeout wrapper and
processException mapping, and close the cursor in a finally.
Keep the result-processing inside the try and just add the finally for
recordPlanActuals, instead of restructuring the method — minimises the diff.
Explain capture is wired in both SQL and Mongo, but the e2e tests gated on
instanceof SQL with a stale 'only SQL' comment, so the engine-agnostic tests
never ran on Mongo. Add getSupportForExplain() (false on base/Memory/Redis,
true on SQL and Mongo, delegated by Pool) and gate the engine-agnostic tests
on it. The find test keeps an instanceof SQL guard since it asserts the
SQL-specific engine label and plan tree.
Comment thread src/Database/Adapter/Pool.php
While the pool is capturing, a nested delegate() on the pinned transaction
adapter — e.g. a before(find) transformation issuing its own query — re-called
startExplainCapture() on the already-capturing adapter and threw 'cannot be
nested', breaking withExplain + withTransaction for apps with such listeners.
Only start (and own the stop/drain of) capture when the adapter isn't already
capturing; nested calls accumulate into the same buffer. Adds a unit test that
fails on the old behavior.
The sanitizer renamed standalone _perms/__metadata identifiers but missed them
when embedded inside plan strings (e.g. a MariaDB attached_condition like
"db_x_collection_y_perms.`_permission` = 'read'"), leaking the internal
permission/metadata table names once the raw tree is exposed. Substring-rewrite
those embedded physical-table tokens too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant