Skip to content

RDSC-4633 How to perform HA failover#2818

Open
ilianiliev-redis wants to merge 1 commit intoredis:mainfrom
ilianiliev-redis:RDSC-4633-ha-failover-setup
Open

RDSC-4633 How to perform HA failover#2818
ilianiliev-redis wants to merge 1 commit intoredis:mainfrom
ilianiliev-redis:RDSC-4633-ha-failover-setup

Conversation

@ilianiliev-redis
Copy link
Contributor

Ticket: https://redislabs.atlassian.net/browse/RDSC-4633

Document how to perform an HA failover test.

Copy link
Contributor

@andy-stark-redis andy-stark-redis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few style suggestions, but otherwise LGTM.

rdi-reloader-77df5f7854-lwmvz 1/1 Running 0 71m
```

2. Identify the leader node - this is the one that has a running `collector-source` pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Identify the leader node - this is the one that has a running `collector-source` pod
2. Identify the leader node - this is the one that has a running `collector-source` pod.


To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following command on the leader node:

1. Identify the database IP (if you are using it with hostname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Identify the database IP (if you are using it with hostname):
1. Identify the database IP (replace `<hostname>` with your own hostname):


## Performing the HA Failover Testing

To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following command on the leader node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following command on the leader node:
To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following commands on the leader node:

54.78.220.161
```

2. For each of the IPs returned by the above command, run the following command to block the traffic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you expect dig to return more than one IP address for a hostname? (Presumably you only need to run the command once on the leader node.) If so, maybe say that explicitly in step 1, because it currently says "Identify the database IP", which makes it sound like there is only one address, but "IP" might potentially be plural here.

Comment on lines +73 to +74
In about 10 second you will start seeing logs from the leader that it could not acquire the leadership.
Once the leader lock expires, the second node will acquire the leadership and you will see logs from the second node indicating that it has become the leader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In about 10 second you will start seeing logs from the leader that it could not acquire the leadership.
Once the leader lock expires, the second node will acquire the leadership and you will see logs from the second node indicating that it has become the leader.
In about 10 seconds you will start seeing log entries from the leader saying that it could not acquire the leadership.
When the leader lock expires, the second node will acquire the leadership and you will see log entries from the second node indicating that it has become the leader.


## Cleanup

To clean up after the test, you can remove the iptables rule that you added to block the traffic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To clean up after the test, you can remove the iptables rule that you added to block the traffic:
To clean up after the test, remove the `iptables` rule that you added to block the traffic:

@@ -0,0 +1,82 @@
---
Title: How to perform HA failover testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your original title was fine, but this fits our usual style a bit more closely.

Suggested change
Title: How to perform HA failover testing
Title: Test HA failover

description: Learn how to perform HA failover testing for Redis Data Integration (RDI) to ensure high availability and reliability of your data integration setup.
group: di
hideListLinks: false
linkTitle: Testing HA failover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
linkTitle: Testing HA failover
linkTitle: Test HA failover

will take place. After the failover, the secondary instance will become the primary one,
and the RDI pipeline will be active on that VM.

You can see how to test HA failover in the [HA failover testing page]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can see how to test HA failover in the [HA failover testing page]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}).
You may find it useful to trigger a failover deliberately to check that RDI is correctly configured to handle it. See [Test HA failover]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}) to learn how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants