Skip to content

RATIS-2433. Cancel transaction in case of failure to append#1382

Merged
szetszwo merged 6 commits intoapache:masterfrom
spacemonkd:RATIS-2433
Apr 9, 2026
Merged

RATIS-2433. Cancel transaction in case of failure to append#1382
szetszwo merged 6 commits intoapache:masterfrom
spacemonkd:RATIS-2433

Conversation

@spacemonkd
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Currently in RaftServerImpl#writeAsyncImpl() the client request is added to the pending requests asynchronously.

In between if there is any failure/exception in appendTransactions() then we are not cancelling the transaction. The failure is returned to the client/retry-cache, but the statemachine is not notified.
This can cause partial state in the statemachine.

We should handle this such that in case of exceptions the statemachine is notified via cancelTransaction().

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2433

How was this patch tested?

Patch was tested via unit tests

@spacemonkd spacemonkd marked this pull request as ready for review March 21, 2026 07:20
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spacemonkd , thanks for working on this!

Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13081527/1382_review.patch

@spacemonkd
Copy link
Copy Markdown
Contributor Author

Thanks for the comments @szetszwo.
Addressed them in the latest commit, could you take another look?

Also the CI is passing in my fork https://github.com/spacemonkd/ratis/actions/runs/24034963428

@spacemonkd spacemonkd requested a review from szetszwo April 6, 2026 14:42
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spacemonkd , thanks for the update! Please see the comments inlined.

leaderState.submitStepDownEvent(StepDownReason.STATE_MACHINE_EXCEPTION);
}
return CompletableFuture.completedFuture(exceptionReply);
return failWithReply(exceptionReply, cacheEntry, context);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StateMachineException is from StateMachine. We should not cancel the transaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this stage startTransaction() has already created a transaction context, so we would need to cancel the transaction in case of failures even if it is a StateMachineException right?

Or should it be the previous behaviour where we only give exceptionReply?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a StateMachine throws an exception, it should also clean up its resources. Cancel transaction is to tell the StateMachine that there is an exception outside the StateMachine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in the latest commit

@spacemonkd
Copy link
Copy Markdown
Contributor Author

@szetszwo could you take another look at this change?

Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit 2c4db47 into apache:master Apr 9, 2026
16 checks passed
@spacemonkd spacemonkd deleted the RATIS-2433 branch April 10, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants