fix(passthrough): drain per-statement queues to stop memory leak#395
Conversation
In passthrough the backend returned before calling complete_execution() / finish_session() -- the only drains for the per-connection execute and session_metrics queues. The frontend enqueues a session and an execute for every statement (mapped or not), so in passthrough those queues grew by one entry per statement and were never reclaimed, leaking ~1KB/statement until the process was OOM-killed. This only manifested with an empty encrypt config, since is_passthrough() = encrypt_config.is_empty() || mapping_disabled(); with a config present even plaintext statements take the mapped path and are drained. Drain the execute and session_metrics queues on execute-terminating messages in the passthrough branch too, mirroring the mapped path. Adds a regression test asserting a full statement lifecycle leaves both queues empty. Fixes BUG-300
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis change drains per-connection execute and session_metrics queues in passthrough by matching execute/session-terminating backend messages and invoking ChangesQueue Draining in Passthrough Mode
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Fixes BUG-300: in passthrough mode (notably when EncryptConfig is empty), the backend was returning early without draining per-statement execute and session_metrics queues, causing unbounded growth and eventual OOM.
Changes:
- Drain
execute+session_metricsqueues on execute-terminating backend messages even when passthrough is enabled. - Add a new unit test intended to prevent regressions around queue growth.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
packages/cipherstash-proxy/src/postgresql/backend.rs |
Adds queue draining in the passthrough branch for execute-terminating backend message types. |
packages/cipherstash-proxy/src/postgresql/context/mod.rs |
Adds a test that asserts complete_execution()/finish_session() drain the queues across many iterations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The existing test called complete_execution()/finish_session() directly, so it passed even against the pre-fix backend (which early-returned in the passthrough branch without draining). It exercised the drain primitives, not the regression. Add passthrough_drains_queues_on_execute_terminating_message in backend.rs, which drives Backend::rewrite() through the passthrough branch with an execute-terminating CommandComplete and asserts the per-connection execute and session_metrics queues stay empty across many statements. This fails against the pre-fix backend, so it actually guards the bug. Rename the original test to complete_execution_and_finish_session_drain_queues and reframe its docs as a primitives-level unit test that points at the new backend test. Add #[cfg(test)] execute_queue_len()/session_metrics_queue_len() accessors so tests assert queue state without reaching into private fields.
|
|
||
| /// Encodes a `CommandComplete` backend message on the wire: | ||
| /// `'C'` + Int32 length (body + 4) + null-terminated command tag. | ||
| fn command_complete_bytes() -> BytesMut { |
There was a problem hiding this comment.
Might be worth expanding coverage for ALL message types
Could use a table-driven test that runs the same assertion for each terminating code:
for msg_bytes in [
command_complete_bytes(),
error_response_bytes(),
portal_suspended_bytes(),
empty_query_response_bytes(),
] {
// enqueue session + execute, run rewrite(), assert both queues == 0
}
There was a problem hiding this comment.
Good call — done in 8f610ab. Made it table-driven over all four execute-terminating codes (CommandComplete, EmptyQueryResponse, PortalSuspended, ErrorResponse), factoring a backend_message() wire-framing helper plus one encoder per code.
Confirmed it earns its keep: removing any single arm from the passthrough drain match now fails the test (e.g. dropping PortalSuspended → PortalSuspended: execute queue not drained at statement 0). Previously only CommandComplete was exercised, so three arms could regress silently.
Per review feedback: the regression test only fed CommandComplete, but the passthrough drain matches CommandComplete | EmptyQueryResponse | PortalSuspended | ErrorResponse. Three arms were uncovered — dropping any of them would have left the test green while the leak returned for that code. Make the test table-driven over all four terminating codes. Factor a backend_message() wire-framing helper and one encoder per code. Verified the test fails when any single arm is removed from the drain match.
Closes #400
Summary
Fixes a memory leak (#400) that affects proxies running in passthrough with an empty encrypt config (no encryption configured). RSS climbs ~1 KB per statement until the pod is OOM-killed. Configured deployments are unaffected. Present in 2.1.23 and 2.2.1; reproduced locally and root-caused.
Root cause
The frontend starts a session and enqueues an execute for every statement (
start_session/set_execute). Those per-connectionexecuteandsession_metricsqueues are only drained bycomplete_execution()/finish_session(). The passthrough branch inbackend.rsreturned before reaching those calls, so in passthrough the queues grew by one entry per statement and were never reclaimed.Only manifests with an empty config because
is_passthrough() = encrypt_config.is_empty() || mapping_disabled()— with a config present, even plaintext statements take the mapped path (which drains).Fix
Drain the
executeandsession_metricsqueues on execute-terminating messages (CommandComplete/EmptyQueryResponse/PortalSuspended/ErrorResponse) in the passthrough branch too, mirroring the mapped path.Verification
Reproduced in a Linux/glibc container, 6×50k = 300k cached extended inserts, empty config:
Now matches the configured (non-leaking) case. Adds regression test
statement_lifecycle_does_not_grow_queues.cargo fmt+clippy --all-features -D warningsclean.Follow-up
The memory reproduction harness (
scripts/mem-repro/) and analysis write-up will land in a separate PR.Workaround (for customers on existing releases)
Set
CS_DEVELOPMENT__DISABLE_MAPPING=trueon unconfigured/passthrough proxies — confirmed leak-free.Summary by CodeRabbit
Bug Fixes
Tests