Skip to content

Contain std::bad_alloc on OTel and OTel+Arrow export paths#86

Open
iskakaushik wants to merge 2 commits intomainfrom
alloc-boundaries
Open

Contain std::bad_alloc on OTel and OTel+Arrow export paths#86
iskakaushik wants to merge 2 commits intomainfrom
alloc-boundaries

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

Summary

  • Wrap ExportEventStats and ExportEventsAsArrow with thin try/catch shims that delegate to *Internal helpers.
  • Catches std::bad_alloc (and other std::exception) from protobuf::Arena, Arrow builders, ZSTD compression, IPC writer construction, and std::make_shared column factories.
  • Without this, an OOM in the bgworker would unwind through PostgreSQL's PG_TRY/longjmp frame and corrupt the error stack.

The OTel SDK exposes try { FlushChunk; } only in CommitBatch/SendArrowBatch. The build phase (BeginBatch → per-row arena allocs → AppendFinish) was unprotected. This PR adds a single barrier per top-level entry point in stats_exporter.cc, leaving the existing internal try/catch in OTel and ClickHouse exporters untouched.

On catch the wrapper calls LogExporterWarning + RecordExporterFailure and returns; the partial batch is abandoned and the next bgworker iteration starts fresh.

Test plan

  • mise run build clean
  • mise run test:regress
  • mise run test:tap (against ../postgres/install_tap)
  • Manual: stress test with constrained memory to verify graceful failure

The OTel and Arrow export builders allocate via protobuf::Arena, Arrow
buffers, ZSTD, and std::make_shared. On OOM these throw std::bad_alloc,
which would unwind through the bgworker's PG_TRY/longjmp frame and
corrupt PostgreSQL's error stack.

Wrap ExportEventStats and ExportEventsAsArrow with thin try/catch
shims that delegate to *Internal helpers, so OOM degrades to "drop
the batch and record a failure" instead of taking down the bgworker.
Copilot AI review requested due to automatic review settings May 6, 2026 01:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds top-level C++ exception barriers around the bgworker’s two main export entry points (event-stats export and Arrow passthrough export) to prevent std::bad_alloc/std::exception from unwinding across PostgreSQL PG_TRY/longjmp frames during OOM or other failures in OTel/Arrow/protobuf/ZSTD paths.

Changes:

  • Refactors ExportEventsAsArrow into ExportEventsAsArrowInternal plus a thin try/catch wrapper.
  • Refactors ExportEventStats into ExportEventStatsInternal plus a thin try/catch wrapper.
  • On exceptions, logs a warning and records exporter failure instead of letting the bgworker unwind through PG frames.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/export/stats_exporter.cc Outdated
Comment on lines +306 to +314
void ExportEventStats(const std::vector<PschEvent>& events, StatsExporter* exporter) {
try {
ExportEventStatsInternal(events, exporter);
} catch (const std::bad_alloc&) {
LogExporterWarning("event stats export", "out of memory");
RecordExporterFailure("event stats OOM");
} catch (const std::exception& e) {
LogExporterWarning("event stats exception", e.what());
RecordExporterFailure(e.what());
Comment thread src/export/stats_exporter.cc Outdated
// Build and export stats (records, metrics, ClickHouse rows) from events
void ExportEventStats(const std::vector<PschEvent>& events, StatsExporter* exporter) {
void ExportEventStatsInternal(const std::vector<PschEvent>& events, StatsExporter* exporter) {
elog(DEBUG1, "pg_stat_ch: ExportEventStats() called with %zu events", events.size());
Address Copilot review on PR #86:

- ExportEventStats now returns bool. PschExportBatch skips CommitBatch
  on false so we never flush a partially-built batch, and never
  re-flush stale state when BeginBatch itself threw before resetting
  the exporter's per-batch buffers.
- Fix the renamed-helper DEBUG log to match its current name.
@serprex serprex requested review from JoshDreamland and serprex May 6, 2026 02:57
@serprex serprex mentioned this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants