RATIS-2433. Cancel transaction in case of failure to append#1382
RATIS-2433. Cancel transaction in case of failure to append#1382szetszwo merged 6 commits intoapache:masterfrom
Conversation
szetszwo
left a comment
There was a problem hiding this comment.
@spacemonkd , thanks for working on this!
Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13081527/1382_review.patch
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
Outdated
Show resolved
Hide resolved
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
Show resolved
Hide resolved
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
Outdated
Show resolved
Hide resolved
ratis-server/src/test/java/org/apache/ratis/server/impl/RaftStateMachineExceptionTests.java
Outdated
Show resolved
Hide resolved
|
Thanks for the comments @szetszwo. Also the CI is passing in my fork https://github.com/spacemonkd/ratis/actions/runs/24034963428 |
szetszwo
left a comment
There was a problem hiding this comment.
@spacemonkd , thanks for the update! Please see the comments inlined.
| leaderState.submitStepDownEvent(StepDownReason.STATE_MACHINE_EXCEPTION); | ||
| } | ||
| return CompletableFuture.completedFuture(exceptionReply); | ||
| return failWithReply(exceptionReply, cacheEntry, context); |
There was a problem hiding this comment.
StateMachineException is from StateMachine. We should not cancel the transaction.
There was a problem hiding this comment.
But in this stage startTransaction() has already created a transaction context, so we would need to cancel the transaction in case of failures even if it is a StateMachineException right?
Or should it be the previous behaviour where we only give exceptionReply?
There was a problem hiding this comment.
If a StateMachine throws an exception, it should also clean up its resources. Cancel transaction is to tell the StateMachine that there is an exception outside the StateMachine.
There was a problem hiding this comment.
Addressed this in the latest commit
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
Outdated
Show resolved
Hide resolved
|
@szetszwo could you take another look at this change? |
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good.
What changes were proposed in this pull request?
Currently in RaftServerImpl#writeAsyncImpl() the client request is added to the pending requests asynchronously.
In between if there is any failure/exception in appendTransactions() then we are not cancelling the transaction. The failure is returned to the client/retry-cache, but the statemachine is not notified.
This can cause partial state in the statemachine.
We should handle this such that in case of exceptions the statemachine is notified via cancelTransaction().
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2433
How was this patch tested?
Patch was tested via unit tests