Skip to content

fix(passthrough): drain per-statement queues to stop memory leak#395

Merged
coderdan merged 3 commits into
mainfrom
dan/bug-300-passthrough-memory-leak
Jun 1, 2026
Merged

fix(passthrough): drain per-statement queues to stop memory leak#395
coderdan merged 3 commits into
mainfrom
dan/bug-300-passthrough-memory-leak

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

@coderdan coderdan commented May 29, 2026

Closes #400

Summary

Fixes a memory leak (#400) that affects proxies running in passthrough with an empty encrypt config (no encryption configured). RSS climbs ~1 KB per statement until the pod is OOM-killed. Configured deployments are unaffected. Present in 2.1.23 and 2.2.1; reproduced locally and root-caused.

Root cause

The frontend starts a session and enqueues an execute for every statement (start_session / set_execute). Those per-connection execute and session_metrics queues are only drained by complete_execution() / finish_session(). The passthrough branch in backend.rs returned before reaching those calls, so in passthrough the queues grew by one entry per statement and were never reclaimed.

Only manifests with an empty config because is_passthrough() = encrypt_config.is_empty() || mapping_disabled() — with a config present, even plaintext statements take the mapped path (which drains).

Fix

Drain the execute and session_metrics queues on execute-terminating messages (CommandComplete / EmptyQueryResponse / PortalSuspended / ErrorResponse) in the passthrough branch too, mirroring the mapped path.

Verification

Reproduced in a Linux/glibc container, 6×50k = 300k cached extended inserts, empty config:

before fix after fix
RSS, baseline → 300k inserts 36 → 317 MiB (climbing, no release) 22 → 49 MiB (plateau)

Now matches the configured (non-leaking) case. Adds regression test statement_lifecycle_does_not_grow_queues. cargo fmt + clippy --all-features -D warnings clean.

Follow-up

The memory reproduction harness (scripts/mem-repro/) and analysis write-up will land in a separate PR.

Workaround (for customers on existing releases)

Set CS_DEVELOPMENT__DISABLE_MAPPING=true on unconfigured/passthrough proxies — confirmed leak-free.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented unbounded queue growth in the PostgreSQL proxy passthrough path, avoiding potential memory exhaustion when passthrough behavior is used.
  • Tests

    • Added a regression test that repeatedly exercises passthrough execution and verifies execute/session queues are drained to prevent future leaks.

In passthrough the backend returned before calling complete_execution() /
finish_session() -- the only drains for the per-connection execute and
session_metrics queues. The frontend enqueues a session and an execute for
every statement (mapped or not), so in passthrough those queues grew by one
entry per statement and were never reclaimed, leaking ~1KB/statement until
the process was OOM-killed.

This only manifested with an empty encrypt config, since
is_passthrough() = encrypt_config.is_empty() || mapping_disabled(); with a
config present even plaintext statements take the mapped path and are drained.

Drain the execute and session_metrics queues on execute-terminating messages
in the passthrough branch too, mirroring the mapped path. Adds a regression
test asserting a full statement lifecycle leaves both queues empty.

Fixes BUG-300
Copilot AI review requested due to automatic review settings May 29, 2026 13:40
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 087c0ad7-dfc0-43a2-bb0a-a709e04e800b

📥 Commits

Reviewing files that changed from the base of the PR and between 15a4684 and 8f610ab.

📒 Files selected for processing (1)
  • packages/cipherstash-proxy/src/postgresql/backend.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/cipherstash-proxy/src/postgresql/backend.rs

📝 Walkthrough

Walkthrough

This change drains per-connection execute and session_metrics queues in passthrough by matching execute/session-terminating backend messages and invoking context.complete_execution() and context.finish_session() before the passthrough early return; adds tests and test-only accessors asserting queues stay empty.

Changes

Queue Draining in Passthrough Mode

Layer / File(s) Summary
Backend passthrough handling
packages/cipherstash-proxy/src/postgresql/backend.rs
In Backend::rewrite passthrough branch, match PostgreSQL backend message codes (CommandComplete, EmptyQueryResponse, PortalSuspended, ErrorResponse) and call context.complete_execution() and context.finish_session() before returning.
Tests and test-only accessors
packages/cipherstash-proxy/src/postgresql/context/mod.rs, packages/cipherstash-proxy/src/postgresql/backend.rs
Adds #[cfg(test)] accessors execute_queue_len() and session_metrics_queue_len() and regression tests that feed passthrough-mode Backend::rewrite() with execute-terminating messages repeatedly, asserting both queues remain zero across iterations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • tobyhede

Poem

🐰 I watched the queues stack high each day,
In passthrough paths they’d quietly stay.
Now messages tell Context when to end,
I drain and finish, the queues descend.
A thousand hops later — no leak in play.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: draining per-statement queues to resolve a memory leak in passthrough mode, which aligns with the changeset's core objective.
Linked Issues check ✅ Passed The PR addresses all coding requirements from #400: drains execute and session_metrics queues in the passthrough branch before returning, covers all execute-terminating message codes, and includes comprehensive regression tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fix the memory leak: modifications to backend.rs for queue draining, test-only accessors in context/mod.rs, and regression tests covering the bug scenario.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dan/bug-300-passthrough-memory-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes BUG-300: in passthrough mode (notably when EncryptConfig is empty), the backend was returning early without draining per-statement execute and session_metrics queues, causing unbounded growth and eventual OOM.

Changes:

  • Drain execute + session_metrics queues on execute-terminating backend messages even when passthrough is enabled.
  • Add a new unit test intended to prevent regressions around queue growth.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/cipherstash-proxy/src/postgresql/backend.rs Adds queue draining in the passthrough branch for execute-terminating backend message types.
packages/cipherstash-proxy/src/postgresql/context/mod.rs Adds a test that asserts complete_execution()/finish_session() drain the queues across many iterations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/cipherstash-proxy/src/postgresql/context/mod.rs Outdated
@coderdan coderdan changed the title fix(passthrough): drain per-statement queues to stop memory leak (BUG-300) fix(passthrough): drain per-statement queues to stop memory leak Jun 1, 2026
The existing test called complete_execution()/finish_session() directly,
so it passed even against the pre-fix backend (which early-returned in the
passthrough branch without draining). It exercised the drain primitives,
not the regression.

Add passthrough_drains_queues_on_execute_terminating_message in backend.rs,
which drives Backend::rewrite() through the passthrough branch with an
execute-terminating CommandComplete and asserts the per-connection execute
and session_metrics queues stay empty across many statements. This fails
against the pre-fix backend, so it actually guards the bug.

Rename the original test to complete_execution_and_finish_session_drain_queues
and reframe its docs as a primitives-level unit test that points at the new
backend test. Add #[cfg(test)] execute_queue_len()/session_metrics_queue_len()
accessors so tests assert queue state without reaching into private fields.

/// Encodes a `CommandComplete` backend message on the wire:
/// `'C'` + Int32 length (body + 4) + null-terminated command tag.
fn command_complete_bytes() -> BytesMut {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth expanding coverage for ALL message types

Could use a table-driven test that runs the same assertion for each terminating code:

  for msg_bytes in [
      command_complete_bytes(),
      error_response_bytes(),
      portal_suspended_bytes(),
      empty_query_response_bytes(),
  ] {
      // enqueue session + execute, run rewrite(), assert both queues == 0
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — done in 8f610ab. Made it table-driven over all four execute-terminating codes (CommandComplete, EmptyQueryResponse, PortalSuspended, ErrorResponse), factoring a backend_message() wire-framing helper plus one encoder per code.

Confirmed it earns its keep: removing any single arm from the passthrough drain match now fails the test (e.g. dropping PortalSuspendedPortalSuspended: execute queue not drained at statement 0). Previously only CommandComplete was exercised, so three arms could regress silently.

Per review feedback: the regression test only fed CommandComplete, but the
passthrough drain matches CommandComplete | EmptyQueryResponse |
PortalSuspended | ErrorResponse. Three arms were uncovered — dropping any of
them would have left the test green while the leak returned for that code.

Make the test table-driven over all four terminating codes. Factor a
backend_message() wire-framing helper and one encoder per code. Verified the
test fails when any single arm is removed from the drain match.
@coderdan coderdan merged commit cd5469d into main Jun 1, 2026
6 checks passed
@coderdan coderdan deleted the dan/bug-300-passthrough-memory-leak branch June 1, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak in passthrough mode (empty encrypt config) — per-statement leak causes OOM

3 participants