Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions content/integrate/redis-data-integration/installation/ha-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
Title: Test HA failover
alwaysopen: false
categories:
- docs
- integrate
- rs
- rdi
description: Learn how to perform HA failover testing for Redis Data Integration (RDI) to ensure high availability and reliability of your data integration setup.
group: di
hideListLinks: false
linkTitle: Test HA failover
summary: How to perform HA failover testing
type: integration
weight: 100
---

## Setup
1. Ensure that RDI is up and running on both primary and secondary nodes.
Run the following command and verify and that each instance should show healthy and running `rdi-api` and `rdi-operator` pods.
```
kubectl -n rdi get pods

# Example output:
NAME READY STATUS RESTARTS AGE
collector-api-577d95bfd8-5wbg6 1/1 Running 0 12m
collector-source-95f45bcf7-vwn5l 1/1 Running 0 12m
fluentd-zq2lc 1/1 Running 0 72m
logrotate-29530445-j729x 0/1 Completed 0 14m
logrotate-29530450-dprr2 0/1 Completed 0 9m40s
logrotate-29530455-mfmzw 0/1 Completed 0 4m40s
processor-f66655469-h7nw2 1/1 Running 0 12m
rdi-api-f75df6796-qwqjw 1/1 Running 0 72m
rdi-metrics-exporter-d57cdf8c8-wjzb5 1/1 Running 0 72m
rdi-operator-7f7f6c7dfd-5qmjd 1/1 Running 0 71m
rdi-reloader-77df5f7854-lwmvz 1/1 Running 0 71m
```

2. Identify the leader node - this is the one that has a running `collector-source` pod.

## Performing the HA Failover Testing

To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following commands on the leader node:

1. Identify the RDI database IP (replace `<hostname>` with your own hostname):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andy-stark-redis I have also added RDI here to make it clear which database we are talking about.

```
dig +short <hostname>

# Example:
# dig +short my.redis.hostname.com

# Example output:
54.78.220.161
```

2. For each of the IPs returned by the above command, run the following command to block the traffic:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you expect dig to return more than one IP address for a hostname? (Presumably you only need to run the command once on the leader node.) If so, maybe say that explicitly in step 1, because it currently says "Identify the database IP", which makes it sound like there is only one address, but "IP" might potentially be plural here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that one hostname points to multiple IPs, for example:

% dig +short example.com
104.18.26.120
104.18.27.120

The idea is that if we have a DNS Round-robin or other load balancing happening at DNS level, we need to block all IPs, to ensure that we can not connect to any of those.


```
sudo iptables -I FORWARD -d <database_ip> -j DROP

# With the IP from the example above, the command would be:
sudo iptables -I FORWARD -d 54.78.220.161 -j DROP
```


The default configuration for the leader lock is 60 seconds, so it may take up to 2 minutes for the failover to occur.
Meanwhile you can follow the logs of the operator to see the failover process:

```
kubectl -n rdi logs rdi-operator-7f7f6c7dfd-5qmjd -f
```

In about 10 seconds you will start seeing log entries from the leader saying that it could not acquire the leadership.
When the leader lock expires, the second node will acquire the leadership and you will see log entries from the second node indicating that it has become the leader.

## Cleanup

To clean up after the test, remove the `iptables` rule that you added to block the traffic:

```sudo iptables -D FORWARD -d <databse_ip> -j DROP```

Use `sudo iptables -S | grep <database_ip>` to verify that the rule has been removed.
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,8 @@ to renew the lease in the RDI database, it will lose the leadership and a failov
will take place. After the failover, the secondary instance will become the primary one,
and the RDI pipeline will be active on that VM.

You may find it useful to trigger a failover deliberately to check that RDI is correctly configured to handle it. See [Test HA failover]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}) to learn how to do this.

## Prepare your source database

Before deploying a pipeline, you must configure your source database to enable CDC. See the
Expand Down
Loading