-
Notifications
You must be signed in to change notification settings - Fork 439
RATIS-2382. Support skip leadership check during ReadIndex #1334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | **Default** | false | | ||
|
|
||
|
|
||
| | **Property** | `raft.server.read.read-index.heartbeat.skip.enabled` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the idea and code.
Just want to discuss whether heartbeat.skip.enabled is a good name, I think maybe raft.server.read.read-index.leadership-check.skip.enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OneSizeFitsQuorum Thanks for the review. I was also grappling with whether "leadership check" is a better name. However, my reasoning of choosing "heartbeat" is because we already have a LeaderStateImpl#checkLeadership to decide whether to leader need to step down due to lost majority heartbeats. LeaderStateImpl#checkLeadership does not send heartbeat, it simply checks the last RPC response time of the majority. Therefore, I picked "heartbeat" to explicitly mention the RPC that is going to be skipped.
That said, I'm ok if the community decide "leadership-check" is a more fitting name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about raft.server.read.read-index.leadership-heartbeat-check.skip.enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OneSizeFitsQuorum Sounds good, updated. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about raft.server.read.read-index.leadership-heartbeat-check.skip.enabled?
Sorry for coming late:
- Since this conf is closely related to
raft.server.read.leader.lease.enabled, they should be next to each other. - The default "skip.enabled == false" is a double negative.
So, how about raft.server.read.leader.heartbeat-check.enabled (default is true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point to make it closer to lease since disabling heartbeat check essentially removed some of the leader lease guarantee.
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3 , thanks for working on this!
Please see #1334 (comment) and the comment inlined.
| | **Property** | `raft.server.read.read-index.leadership-heartbeat-check.skip.enabled` | | ||
| |:----------------|:----------------------------------------------------------------------| | ||
| | **Description** | whether to skip the leadership check heartbeat for read index. | | ||
| | **Type** | boolean | | ||
| | **Default** | false | | ||
|
|
||
| Note that although enabling `leadership-heartbeat-check.skip.enabled` reduce the RTT due to the leadership check heartbeat, | ||
| it causes reads to not be linearizable in some cases. | ||
| This is because without the leadership check heartbeats, the leader might not be the latest leader | ||
| and might serve stale reads. | ||
| There might be a small period of time when there is a split brain and two Raft peers that believe that | ||
| they are leaders (old leader and new leader where old leader's term < new leader's term). | ||
| The old leader might not have detected that it has lost majority heartbeats and stepped down or the old leader | ||
| has not received any RPC with higher term which forces the old leader to steps down. | ||
| Without the leadership check heartbeat which should detect that the old leader is no longer the latest leader and | ||
| force the old leader to step down, the old leader ReadIndex might return an index that is lower than | ||
| the new leader's applied index. | ||
| This means that the old leader might return stale data which is not linearizable by definition. | ||
| However, this situation should happen only abnormal situations when the Raft group encounters a leader election | ||
| due to network partition between the old leader and the other majority quorum (which will elect a new leader). | ||
| Therefore, this might be an acceptable tradeoff for applications that seek to improve the linearizable read | ||
| performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the doc. Below is my suggestion:
| **Property** | `raft.server.read.leader.heartbeat-check.enabled` |
|:----------------|:--------------------------------------------------|
| **Description** | whether to check heartbeat for read index. |
| **Type** | boolean |
| **Default** | true |
Note that the original read index algorithm requires heartbeat check
in order to guarantee linearizable read.
By setting this property to false,
it reduces the RTT by eliminating the heartbeat check.
However, it might cause the reads not to be linearizable in a split-brain case.
Without the heartbeat check, a leader might not be the latest leader
and, as a result, it might serve stale reads.
When there is a split brain, there might be a small period of time
that the (old) leader has lost majority heartbeats but have not yet detected it.
As the same time, a new leader is elected by a majority of peers.
Then, the old leader might serve stale data
since it does not have the transactions committed by the new leaders.
Since such split-brain case is supposed to be rare,
it might be an acceptable tradeoff for applications that
seek to improve the linearizable read performance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
|
@ivandika3 , thanks for working on this! @OneSizeFitsQuorum , thanks for reviewing this! |
|
Thanks @OneSizeFitsQuorum @szetszwo for the reviews. |
What changes were proposed in this pull request?
Please refer to https://issues.apache.org/jira/browse/RATIS-2382
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2382
How was this patch tested?
CI (https://github.com/ivandika3/ratis/actions/runs/20984539273)