fix(lockservice): fence stale binds by allocator epoch#24905
fix(lockservice): fence stale binds by allocator epoch#24905iamlinjunhong wants to merge 6 commits into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I did not confirm a blocking logic bug in the implementation itself, but there is still one important unhappy-path regression gap that should be covered before this is approved.
The new fix combines two recovery actions off the same keepalive failure response: local-bind cleanup when OK=false, and stale-bind purge when the allocator epoch increases. Current tests cover allocator-epoch purge and OK=false handling separately, but I do not see a test that drives a single keepalive response with both OK=false and a newer AllocatorVersion, then verifies that:
- stale local binds are removed
- stale remote/proxy binds from the old epoch are removed
lastAllocatorVersionadvances- current-epoch binds are preserved
That combined failure path is the core unhappy-path this PR is hardening after allocator restart/failover. Without one regression test for the combined case, a future refactor could short-circuit one branch while the other still fires and leave stale bindings behind. Please add a focused test around doKeepLockTableBind / keepalive handling for this case.
fixed |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24896
What this PR does / why we need it:
fix(lockservice): fence stale binds by allocator epoch