Add PostgreSQL HA cluster guide by sadohert · Pull Request #8919 · mattermost/docs

sadohert · 2026-04-27T13:36:35Z

Summary

Adds source/administration-guide/scale/postgres-ha-cluster.rst — a new guide for deploying a 3-node PostgreSQL HA cluster using repmgr, HAProxy, and Keepalived
Updates scaling-for-enterprise.rst toctree and adds a prose entry
Updates high-availability-cluster-based-deployment.rst to cross-link to the new page

Why

The existing HA doc covers Mattermost app-layer clustering but explicitly does not cover database HA. Many self-hosted customers on bare-metal or VMs need a database-level HA guide. This fills that gap.

Page structure

The guide follows Mattermost documentation conventions:

Requirements consolidated upfront (Before you begin section)
Decision guidance table to help admins choose the right architecture
5 numbered phases with explicit pass/fail checkpoint commands
Day-2 operations and troubleshooting sections

Validation

All setup steps and checkpoint commands have been validated on Ubuntu 24.04 LTS, PostgreSQL 17, repmgr 5.5, HAProxy 2.8.

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Updates documentation to clarify that the deployment guide excludes database-level HA and disaster recovery, introduces a new PostgreSQL HA cluster admin guide covering repmgr, HAProxy, and Keepalived with ordered phases, validation, and troubleshooting, and adds a corresponding TOC entry and high-availability subsection.

Changes

PostgreSQL HA documentation

Layer / File(s)	Summary
Deployment guide intro `source/administration-guide/scale/high-availability-cluster-based-deployment.rst`	Reworded introduction to exclude database-level HA and disaster recovery from scope, and redirect self-hosted deployments to the new PostgreSQL HA cluster guide.
PostgreSQL HA cluster architecture and planning `source/administration-guide/scale/postgres-ha-cluster.rst`	New guide introduction: components (PostgreSQL, repmgr, HAProxy, Keepalived, pgchk.py), port routing (5000 writes, 5001 reads), validated versions, hardware, required open ports, pre-deployment planning worksheet, and phased setup overview.
PostgreSQL HA deployment Phases 1–3 `source/administration-guide/scale/postgres-ha-cluster.rst`	Base OS and package installation; PostgreSQL replication configuration (postgresql.conf, pg_hba.conf, .pgpass) with restart and verification; repmgr superuser/database creation, per-node configuration files, primary registration, standby cloning/registration, and systemd startup with cluster-health checkpoints.
PostgreSQL HA Phase 4: HAProxy and Keepalived VIP `source/administration-guide/scale/postgres-ha-cluster.rst`	HAProxy configuration for TCP routing (primary on 5000, standbys on 5001); pgchk.py HTTP health-check service (port 8008) for primary/replica detection; Keepalived VRRP with floating VIP, node priorities, and failover verification.
PostgreSQL HA Phase 5: validation and production readiness `source/administration-guide/scale/postgres-ha-cluster.rst`	End-to-end validation: healthy cluster baseline, primary failure simulation and repmgr promotion confirmation, HAProxy/VIP routing to new primary, old primary re-registration as standby, and production-readiness sign-off.
PostgreSQL HA day-2 operations and troubleshooting `source/administration-guide/scale/postgres-ha-cluster.rst`	Operational commands (cluster status, replication lag, controlled switchover); procedures for adding and rejoining standby nodes; troubleshooting playbooks (repmgrd startup, standby replication, VIP movement, HAProxy routing, split-brain prevention).
Scaling reference TOC and HA subsection `source/administration-guide/scale/scaling-for-enterprise.rst`	Added PostgreSQL HA cluster subsection in "High availability" section with description of repmgr/HAProxy/Keepalived failover approach; updated "Available reference architectures" toctree to include PostgreSQL HA cluster entry; removed trailing whitespace.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Admin as Self-hosted Admin
    participant Keepalived
    participant HAProxy
    participant pgchk as pgchk.py (health service)
    participant repmgr
    participant Primary as PostgreSQL<br/>(Primary)
    participant Standby1 as PostgreSQL<br/>(Standby A)
    participant Standby2 as PostgreSQL<br/>(Standby B)

    Admin->>Primary: Phase 1–2: install & configure replication
    Admin->>Standby1: Phase 1–2: install & configure replication
    Admin->>Standby2: Phase 1–2: install & configure replication
    
    Admin->>repmgr: Phase 3: register primary and standbys
    repmgr->>Primary: monitor & manage primary
    repmgr->>Standby1: monitor & manage standby
    repmgr->>Standby2: monitor & manage standby
    
    Admin->>HAProxy: Phase 4: configure TCP frontends (5000 write, 5001 read)
    Admin->>pgchk: Phase 4: deploy health check service (8008)
    Admin->>Keepalived: Phase 4: configure VRRP & floating VIP
    
    HAProxy->>pgchk: periodic health check (HTTP 8008)
    pgchk->>repmgr: query current node role/state
    repmgr-->>pgchk: primary or replica status
    
    Note over repmgr,Standby2: Phase 5: On primary failure
    Primary--xrepmgr: primary stops responding
    repmgr->>Standby1: promote to primary
    repmgr->>HAProxy: state change detected in next health check
    pgchk->>repmgr: re-query on Standby1
    repmgr-->>pgchk: now primary
    HAProxy->>Keepalived: routing updated to Standby1 node
    Keepalived->>Keepalived: VIP moves to Standby1 host
    HAProxy->>Standby1: write traffic (5000) routed to new primary

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

Guidance

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly summarizes the primary change: adding a new PostgreSQL HA cluster deployment guide.
Description check	✅ Passed	The description is well-related to the changeset, providing context about why the guide was added, its structure, and validation performed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

source/administration-guide/scale/high-availability-cluster-based-deployment.rst (1)

33-35: Clarify HA vs DR scope in this redirect sentence.

This reads a bit awkwardly and mixes concepts; make the scope explicit as “database-level HA/DR design is out of scope here” before linking to the PostgreSQL HA page.

Suggested minimal wording update

-Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of
-disaster recovery. For self-hosted deployments requiring database-level HA,
+Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover database high availability or disaster recovery design.
+For self-hosted deployments requiring database-level HA,
 see :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>`.

As per coding guidelines, "Flag awkward phrasing, punctuation mistakes, tense shifts, and terminology inconsistencies in documentation."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@source/administration-guide/scale/high-availability-cluster-based-deployment.rst`
around lines 33 - 35, Revise the redirect sentence that begins "For self-hosted
deployments requiring database-level HA," to explicitly state that
database-level HA and disaster recovery (DR) design are out of scope for this
document, then follow with the existing link to the PostgreSQL page (the
:doc:`PostgreSQL high availability cluster
</administration-guide/scale/postgres-ha-cluster>` reference) so readers are
clearly directed for DB-level HA/DR details; replace the awkward phrasing with a
concise sentence such as "Database-level HA and DR design are out of scope for
this document; for PostgreSQL-specific guidance, see :doc:`PostgreSQL high
availability cluster </administration-guide/scale/postgres-ha-cluster>`."

source/administration-guide/scale/postgres-ha-cluster.rst (1)

331-334: Tighten replication user privileges (or justify superuser explicitly).

createuser --superuser repmgr is broader than necessary for many environments. Prefer least privilege by default, or add a strong warning explaining why elevated rights are required in this design.

As per coding guidelines, "When reviewing ... through the lens of Veteran Vince ... flag content that is ... security-unsafe."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/administration-guide/scale/postgres-ha-cluster.rst` around lines 331 -
334, The current step creates a replication user with --superuser which is
broader than needed; replace it with a least-privilege creation and explicit
grants (create the user with REPLICATION and LOGIN, set a password, and grant
only the specific DB ownership/SEARCH_PATH needed) and update the ALTER USER
repmgr SET search_path line to match the non-superuser account; if you must keep
--superuser for this design, add a prominent security justification/warning
explaining why elevated rights are required and the risks. Refer to the existing
commands (createuser --superuser repmgr and ALTER USER repmgr SET search_path TO
repmgr, public) when making the change so the docs show the least-privilege
variant and the optional justified-superuser warning.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/administration-guide/scale/postgres-ha-cluster.rst`:
- Around line 295-305: Change the documented pg_hba entries so that
``scram-sha-256`` is shown as the default production authentication method
instead of ``trust`` (update the example host lines and note text that currently
reference ``trust``), and move the ``trust`` entries into a clearly labeled
"lab/testing only" fallback section; also wrap the lab-only fallback in a
warning admonition to highlight the security risk (refer to the example host
entries for repmgr and replication and the existing note block that mentions
``.pgpass`` to update placement and wording accordingly).
- Around line 520-534: The Keepalived snippet hardcodes "interface eth0" which
will fail on systems with predictable NIC names; update the docs around creating
/etc/keepalived/keepalived.conf and the vrrp_instance VI_1 block to require
selecting the correct network interface (do not assume eth0), add a brief step
telling the user to identify the active interface (e.g., via ip link or ip addr)
and substitute that interface name into the "interface" field of the
vrrp_instance configuration before applying Keepalived, and include a short
example note instructing readers to replace <CLUSTER_VIP> and the placeholder
interface with their actual values.
- Around line 483-490: The doc currently references a non-existent repo for
pgchk.py; replace that broken external link by either embedding the full
pgchk.py script directly in this document (with a short explanation of its
purpose and usage) and instructing readers to save it to /usr/local/bin/pgchk.py
and chmod +x, or point to a verified alternate source and include a pinned
commit/tag and SHA256 checksum plus a one-line curl/wget + sha256sum
verification step; update the text around the symbol pgchk.py to include the
chosen solution and add verification instructions so admins can validate the
file before placing it at /usr/local/bin/pgchk.py.

---

Nitpick comments:
In
`@source/administration-guide/scale/high-availability-cluster-based-deployment.rst`:
- Around line 33-35: Revise the redirect sentence that begins "For self-hosted
deployments requiring database-level HA," to explicitly state that
database-level HA and disaster recovery (DR) design are out of scope for this
document, then follow with the existing link to the PostgreSQL page (the
:doc:`PostgreSQL high availability cluster
</administration-guide/scale/postgres-ha-cluster>` reference) so readers are
clearly directed for DB-level HA/DR details; replace the awkward phrasing with a
concise sentence such as "Database-level HA and DR design are out of scope for
this document; for PostgreSQL-specific guidance, see :doc:`PostgreSQL high
availability cluster </administration-guide/scale/postgres-ha-cluster>`."

In `@source/administration-guide/scale/postgres-ha-cluster.rst`:
- Around line 331-334: The current step creates a replication user with
--superuser which is broader than needed; replace it with a least-privilege
creation and explicit grants (create the user with REPLICATION and LOGIN, set a
password, and grant only the specific DB ownership/SEARCH_PATH needed) and
update the ALTER USER repmgr SET search_path line to match the non-superuser
account; if you must keep --superuser for this design, add a prominent security
justification/warning explaining why elevated rights are required and the risks.
Refer to the existing commands (createuser --superuser repmgr and ALTER USER
repmgr SET search_path TO repmgr, public) when making the change so the docs
show the least-privilege variant and the optional justified-superuser warning.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4e4d9ba5-82a6-486e-8263-05d6d2475d02

📥 Commits

Reviewing files that changed from the base of the PR and between 67012b5 and 0f144d7.

📒 Files selected for processing (3)

source/administration-guide/scale/high-availability-cluster-based-deployment.rst
source/administration-guide/scale/postgres-ha-cluster.rst
source/administration-guide/scale/scaling-for-enterprise.rst

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- pg_hba.conf: default to scram-sha-256; trust moved to lab-only warning - pgchk.py: embed full script inline instead of linking to external repo - Keepalived: add ip link step to identify interface before hardcoding - repmgr createuser: add note explaining why superuser is required - high-availability-cluster-based-deployment.rst: clarify HA vs DR scope Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sadohert · 2026-04-27T13:58:18Z

@coderabbitai review

coderabbitai · 2026-04-27T13:58:24Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

sadohert · 2026-04-28T14:08:42Z

@esethna @wiersgallak - Would one of you mind adding the preview-environment label to this? I'd like to share wiht a customer for early preview.

github-actions · 2026-04-28T14:18:31Z

Newest code from sadohert has been published to preview environment for Git SHA faa3b49

esethna · 2026-04-28T14:19:48Z

Done @sadohert

mattermost-build · 2026-05-09T01:00:36Z

This PR has been automatically labelled "stale" because it hasn't had recent activity.
A core team member will check in on the status of the PR to help with questions.
Thank you for your contribution!

Combs7th · 2026-05-23T03:37:02Z

Heya @sadohert - Is there you a dev you'd recommend we tag on this for the technical review?

sadohert · 2026-05-23T12:54:41Z

Hey @Combs7th - You could raise with Jesse, Alejandro, or Doug?

agarciamontoro

Amazing guide! The only thing that I though "huh, weird", is the need for the Python script. Isn't there a way to automate that so that we can avoid using that mini-server? If not, it's ok, but I'm curious.

My comments are mostly nits, so treat them like that, but I wanted to make sure I understood everything :)

agarciamontoro · 2026-06-04T17:39:42Z

+   This guide has been validated on: **Ubuntu 24.04 LTS**, **PostgreSQL 17**,
+   **repmgr 5.5**, **HAProxy 2.8**, **Keepalived**.


Are we missing the version in Keepalived?

Good catch — added Keepalived 2.2.8 here (the version that ships with Ubuntu 24.04 LTS). Fixed in 910be71.

agarciamontoro · 2026-06-04T17:53:51Z

+   backend pg_primary
+       option tcp-check
+       server pg1 <PG1_IP>:5432 check port 8008
+       server pg2 <PG2_IP>:5432 check port 8008 backup
+       server pg3 <PG3_IP>:5432 check port 8008 backup
+
+   backend pg_replicas
+       balance roundrobin
+       option tcp-check
+       server pg2 <PG2_IP>:5432 check port 8008
+       server pg3 <PG3_IP>:5432 check port 8008
+       server pg1 <PG1_IP>:5432 check port 8008 backup


I guess the order in each of these blocks is important. Should we make a note about that? Or would that be too much?

Good catch — this pointed at a real issue. The config was using option tcp-check, which only confirms port 8008 is reachable. Since pgchk.py listens on 8008 on every node regardless of role, that check passed everywhere, so routing depended on the static server order rather than on which node is actually the primary — and it would not follow the primary after a repmgr failover.

I've switched both backends to option httpchk GET /master / GET /replica with http-check expect status 200, so HAProxy now routes on the health-check result (only the primary answers 200 on /master, only standbys on /replica). Order is no longer significant, and the now-redundant backup markers are removed. Fixed in 8d86f14.

agarciamontoro · 2026-06-04T17:55:17Z

+On each node, create ``/usr/local/bin/pgchk.py`` with the following content:
+
+.. code-block:: python
+
+   #!/usr/bin/env python3
+   import subprocess
+   from http.server import BaseHTTPRequestHandler, HTTPServer
+   import argparse
+
+   DEFAULT_PORT = 8008
+   PG_USER = "postgres"
+   PG_DB = "postgres"
+   PG_PORT = "5432"
+
+   class PostgresHealthCheckHandler(BaseHTTPRequestHandler):
+       def safe_write(self, data):
+           try:
+               self.wfile.write(data)
+           except (BrokenPipeError, ConnectionResetError):
+               pass
+
+       def check_postgres_status(self):
+           try:
+               cmd = ["psql", "-U", PG_USER, "-d", PG_DB, "-p", PG_PORT,
+                      "-t", "-c", "SELECT pg_is_in_recovery();"]
+               result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
+               if result.returncode != 0:
+                   return None
+               output = result.stdout.strip()
+               if output == 't':
+                   return True   # Standby
+               elif output == 'f':
+                   return False  # Primary
+               return None
+           except Exception:
+               return None
+
+       def do_GET(self):
+           status = self.check_postgres_status()
+           if status is None:
+               self.send_response(503)
+               self.end_headers()
+               self.safe_write(b"PostgreSQL Unreachable\n")
+               return
+           if self.path in ('/', '/master'):
+               if not status:
+                   self.send_response(200); self.end_headers()
+                   self.safe_write(b"OK - Primary\n")
+               else:
+                   self.send_response(503); self.end_headers()
+                   self.safe_write(b"Service Unavailable - Not Primary\n")
+           elif self.path == '/replica':
+               if status:
+                   self.send_response(200); self.end_headers()
+                   self.safe_write(b"OK - Replica\n")
+               else:
+                   self.send_response(503); self.end_headers()
+                   self.safe_write(b"Service Unavailable - Not Replica\n")
+           else:
+               self.send_response(404); self.end_headers()
+               self.safe_write(b"Not Found\n")
+
+       def log_message(self, format, *args):
+           pass
+
+   def run(port=DEFAULT_PORT):
+       httpd = HTTPServer(('', port), PostgresHealthCheckHandler)
+       print(f"Starting PostgreSQL Health Check on port {port}...")
+       try:
+           httpd.serve_forever()
+       except KeyboardInterrupt:
+           pass
+       httpd.server_close()
+
+   if __name__ == '__main__':
+       parser = argparse.ArgumentParser(description='PostgreSQL Health Check for HAProxy')
+       parser.add_argument('--port', type=int, default=DEFAULT_PORT)
+       args = parser.parse_args()
+       run(port=args.port)


Should we host this in a repo or in a gist so that we can track history here? Just an idea.

We're keeping pgchk.py embedded inline on purpose — it keeps the script versioned alongside the docs with no external dependency to fetch or trust at deploy time. (This also resolved a separate review note about an external repo link that no longer existed.) Happy to revisit hosting it separately if the script grows.

Hi @agarciamontoro - Its me :). That above answer was not. I will resolve this another way.

agarciamontoro · 2026-06-04T17:58:44Z

+   sudo systemctl enable keepalived
+   sudo systemctl start keepalived
+
+✅ **Phase 4 checkpoint** — run on any node:


every again? Maybe this is just me, sorry 😂

Suggested change

✅ **Phase 4 checkpoint** — run on any node:

✅ **Phase 4 checkpoint** — run on every node:

Same reasoning as the Phase 3 checkpoint — these commands report cluster-wide/VIP-level state, so running them once on any node is sufficient. Kept "any node" here; the per-node service checks in Phases 1–2 use "every node".

agarciamontoro · 2026-06-04T18:01:54Z

+Add a standby node
+~~~~~~~~~~~~~~~~~~
+
+1. Provision a new server and complete Phases 1–2 of the setup guide.
+2. Create ``/etc/repmgr.conf`` with the next available ``node_id``.
+3. On the new node:
+
+   .. code-block:: bash
+
+      sudo systemctl stop postgresql
+      sudo -u postgres repmgr -h <PRIMARY_IP> -U repmgr -d repmgr \
+          -f /etc/repmgr.conf standby clone --delete-existing-pgdata
+      sudo systemctl start postgresql
+      sudo -u postgres repmgr -f /etc/repmgr.conf standby register
+
+4. Add the new node to ``/etc/haproxy/haproxy.cfg`` on all existing nodes and
+   reload HAProxy: ``sudo systemctl reload haproxy``.


Do we need to change the /etc/hosts file of the pre-existing nodes to add the new pg4 with its IP?

Yes — added an explicit step for it: when adding a standby, add the new node's hostname/IP to /etc/hosts on the new node and on every existing node so all nodes can resolve the new hostname. Fixed in 910be71.

mattermost-build · 2026-06-15T01:00:41Z

This PR has been automatically labelled "stale" because it hasn't had recent activity.
A core team member will check in on the status of the PR to help with questions.
Thank you for your contribution!

- Add Keepalived 2.2.8 to validated-versions note and components table - Clarify three nodes is the recommended minimum (and you can scale out) - Note that /etc/hosts entries use real IPs (not localhost) on every node - Explain the repmgr password is chosen now but set on the DB in Step 3.1 - Point host adjustment at the conninfo setting in repmgr.conf - Comment why pgchk.py overrides log_message (it is used, by the framework) - Add an /etc/hosts step when adding a new standby node Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace 'option tcp-check' with 'option httpchk GET /master' (and GET /replica) plus 'http-check expect status 200' on both backends. tcp-check only confirmed port 8008 was reachable — which it is on every node — so write/read routing depended on static server order rather than actual primary/standby status, and would not follow the primary after a repmgr failover. HTTP checks against pgchk.py's /master and /replica endpoints fix this; the now-redundant 'backup' markers are removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sadohert · 2026-06-15T14:10:57Z

@wiersgallak would you mind adding the preview label here so I can review before re-requesting a review from Alejandro?

github-actions · 2026-06-15T15:13:27Z

Newest code from sadohert has been published to preview environment for Git SHA a7a53bb

When 'repmgr cluster show' lists fewer than three nodes, run it on every node and compare — views can diverge when a node is isolated — and collect all three outputs for support. Addresses review feedback on the checkpoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…stgres-ha-cluster-guide

mattermost-build added the Contributor label Apr 27, 2026

coderabbitai Bot requested changes Apr 27, 2026

View reviewed changes

Comment thread source/administration-guide/scale/postgres-ha-cluster.rst Outdated

Comment thread source/administration-guide/scale/postgres-ha-cluster.rst Outdated

Comment thread source/administration-guide/scale/postgres-ha-cluster.rst Outdated

sadohert and others added 12 commits April 27, 2026 09:40

docs: add postgres-ha-cluster.rst skeleton

c1a292d

docs: postgres-ha-cluster — architecture overview section

c11a8fc

docs: postgres-ha-cluster — before you begin / decision guidance section

cbb742a

docs: postgres-ha-cluster — setup Phase 1 (base install)

3caf951

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: postgres-ha-cluster — setup Phase 2 (PostgreSQL config)

ec40b00

docs: postgres-ha-cluster — setup Phase 3 (repmgr + standby clone)

755c771

docs: postgres-ha-cluster — setup Phase 4 (HAProxy, pgchk, Keepalived)

30a6955

docs: postgres-ha-cluster — setup Phase 5 (end-to-end validation)

63688bb

docs: postgres-ha-cluster — day-2 operations section

738ceef

docs: postgres-ha-cluster — troubleshooting section

9a95ced

docs: add postgres-ha-cluster to scaling-for-enterprise toctree

aa66119

docs: cross-link to postgres-ha-cluster from HA cluster deployment page

1e7d060

sadohert force-pushed the postgres-ha-cluster-guide branch from 0f144d7 to 1e7d060 Compare April 27, 2026 13:40

chore: trigger CodeRabbit re-review

1fb0f5e

coderabbitai Bot approved these changes Apr 27, 2026

View reviewed changes

Merge branch 'master' into postgres-ha-cluster-guide

faa3b49

esethna added the preview-environment Allow the preview environment to be generated for Pull Requests coming from fork repositories label Apr 28, 2026

mattermost-build added the Lifecycle/1:stale label May 9, 2026

Merge branch 'master' into postgres-ha-cluster-guide

ab3f249

wiersgallak requested a review from agarciamontoro June 2, 2026 13:56

wiersgallak added 1: Dev Review Requires review by a core commiter and removed Lifecycle/1:stale labels Jun 2, 2026

Merge branch 'master' into postgres-ha-cluster-guide

dc4f4a2

agarciamontoro requested changes Jun 4, 2026

View reviewed changes

mattermost-build added the Lifecycle/1:stale label Jun 15, 2026

sadohert and others added 3 commits June 15, 2026 09:54

Merge branch 'master' into postgres-ha-cluster-guide

a7a53bb

wiersgallak added preview-environment Allow the preview environment to be generated for Pull Requests coming from fork repositories and removed Lifecycle/1:stale preview-environment Allow the preview environment to be generated for Pull Requests coming from fork repositories labels Jun 15, 2026

sadohert and others added 2 commits June 15, 2026 21:47

Merge remote-tracking branch 'fork/postgres-ha-cluster-guide' into po…

690be1d

…stgres-ha-cluster-guide

		This guide has been validated on: Ubuntu 24.04 LTS, PostgreSQL 17,
		repmgr 5.5, HAProxy 2.8, Keepalived.

	✅ Phase 4 checkpoint — run on any node:
	✅ Phase 4 checkpoint — run on every node:

Conversation

sadohert commented Apr 27, 2026

Summary

Why

Page structure

Validation

Related

Uh oh!

mattermost-build commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sadohert commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Uh oh!

sadohert commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

esethna commented Apr 28, 2026

Uh oh!

mattermost-build commented May 9, 2026

Uh oh!

Combs7th commented May 23, 2026

Uh oh!

sadohert commented May 23, 2026

Uh oh!

agarciamontoro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattermost-build commented Jun 15, 2026

Uh oh!

sadohert commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading