Skip to content

PG doc: add note on statistics and snapshotting in PG considerations#35804

Open
martykulma wants to merge 2 commits intoMaterializeInc:mainfrom
martykulma:maz-pg-src-doc-parallel-snapshot
Open

PG doc: add note on statistics and snapshotting in PG considerations#35804
martykulma wants to merge 2 commits intoMaterializeInc:mainfrom
martykulma:maz-pg-src-doc-parallel-snapshot

Conversation

@martykulma
Copy link
Copy Markdown
Contributor

Adds a section in PostgreSQL source considerations to highlight the relationship between parallel snapshot, console snapshot reporting, and up to date PostgreSQL table statistics.

@github-actions
Copy link
Copy Markdown

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@martykulma martykulma force-pushed the maz-pg-src-doc-parallel-snapshot branch from 5aa9b79 to 2e72144 Compare March 31, 2026 13:49
@martykulma martykulma marked this pull request as ready for review March 31, 2026 14:00
@martykulma martykulma requested a review from a team as a code owner March 31, 2026 14:00
Copy link
Copy Markdown
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a suggestion (feel free to ignore if I missed the point) and a question.

using ranges of
[`CTID`](https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CTID).
Materialize uses [estimates](https://www.postgresql.org/docs/current/row-estimation-examples.html)
for the amount of data and rows that will be read. Missing or stale statistics will result in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but maybe?

The PostgreSQL source performs parallel snapshotting of tables by distributing rows among workers using ranges of [`CTID`](https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CTID). Materialize uses [PostgreSQL statistics to estimate](https://www.postgresql.org/docs/current/row-estimation-examples.html) the amount of data and number of rows to read. Missing or stale statistics can result in uneven work distribution, reducing snapshot performance. They can also cause incorrect snapshot progress reporting in the Console.

To avoid this situation, before creating the source in Materialize, ensure statistics are up to date by running PostgreSQL `ANALYZE` command.

Also, do you think in the actual postgres ingest tutorials, we should mention this Analyze step?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tutorials you're referencing the vendor specific steps (e.g. https://preview.materialize.com/materialize/35804/ingest-data/postgres/alloydb/)? I put it in considerations as that section appears for each of the vendors. If there's another tutorials area, seems like we may want to put it there as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants