Contain std::bad_alloc on OTel and OTel+Arrow export paths#86
Open
iskakaushik wants to merge 2 commits intomainfrom
Open
Contain std::bad_alloc on OTel and OTel+Arrow export paths#86iskakaushik wants to merge 2 commits intomainfrom
iskakaushik wants to merge 2 commits intomainfrom
Conversation
The OTel and Arrow export builders allocate via protobuf::Arena, Arrow buffers, ZSTD, and std::make_shared. On OOM these throw std::bad_alloc, which would unwind through the bgworker's PG_TRY/longjmp frame and corrupt PostgreSQL's error stack. Wrap ExportEventStats and ExportEventsAsArrow with thin try/catch shims that delegate to *Internal helpers, so OOM degrades to "drop the batch and record a failure" instead of taking down the bgworker.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds top-level C++ exception barriers around the bgworker’s two main export entry points (event-stats export and Arrow passthrough export) to prevent std::bad_alloc/std::exception from unwinding across PostgreSQL PG_TRY/longjmp frames during OOM or other failures in OTel/Arrow/protobuf/ZSTD paths.
Changes:
- Refactors
ExportEventsAsArrowintoExportEventsAsArrowInternalplus a thintry/catchwrapper. - Refactors
ExportEventStatsintoExportEventStatsInternalplus a thintry/catchwrapper. - On exceptions, logs a warning and records exporter failure instead of letting the bgworker unwind through PG frames.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+306
to
+314
| void ExportEventStats(const std::vector<PschEvent>& events, StatsExporter* exporter) { | ||
| try { | ||
| ExportEventStatsInternal(events, exporter); | ||
| } catch (const std::bad_alloc&) { | ||
| LogExporterWarning("event stats export", "out of memory"); | ||
| RecordExporterFailure("event stats OOM"); | ||
| } catch (const std::exception& e) { | ||
| LogExporterWarning("event stats exception", e.what()); | ||
| RecordExporterFailure(e.what()); |
| // Build and export stats (records, metrics, ClickHouse rows) from events | ||
| void ExportEventStats(const std::vector<PschEvent>& events, StatsExporter* exporter) { | ||
| void ExportEventStatsInternal(const std::vector<PschEvent>& events, StatsExporter* exporter) { | ||
| elog(DEBUG1, "pg_stat_ch: ExportEventStats() called with %zu events", events.size()); |
Address Copilot review on PR #86: - ExportEventStats now returns bool. PschExportBatch skips CommitBatch on false so we never flush a partially-built batch, and never re-flush stale state when BeginBatch itself threw before resetting the exporter's per-batch buffers. - Fix the renamed-helper DEBUG log to match its current name.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ExportEventStatsandExportEventsAsArrowwith thin try/catch shims that delegate to*Internalhelpers.std::bad_alloc(and otherstd::exception) fromprotobuf::Arena, Arrow builders, ZSTD compression, IPC writer construction, andstd::make_sharedcolumn factories.PG_TRY/longjmp frame and corrupt the error stack.The OTel SDK exposes
try { FlushChunk; }only inCommitBatch/SendArrowBatch. The build phase (BeginBatch→ per-row arena allocs →Append→Finish) was unprotected. This PR adds a single barrier per top-level entry point instats_exporter.cc, leaving the existing internal try/catch in OTel and ClickHouse exporters untouched.On catch the wrapper calls
LogExporterWarning+RecordExporterFailureand returns; the partial batch is abandoned and the next bgworker iteration starts fresh.Test plan
mise run buildcleanmise run test:regressmise run test:tap(against../postgres/install_tap)