CASSANDRA-21322: Paxos support for live migration to mutation tracking by aweisberg · Pull Request #4806 · apache/cassandra

aweisberg · 2026-05-13T16:14:13Z

During mutation tracking migration (tracked <-> untracked), the per-range migration state must be consulted for every routing decision. Before this change, all Paxos V1 and V2 paths used the static keyspace-level replicationType().isTracked() check, which does not reflect per-range migration state and produces incorrect routing decisions during migration.

Coordinator-side routing: Replace all static isTracked() checks with MigrationRouter calls across StorageProxy (commitPaxos, sendCommit, isTrackedKeyspaceRequiringPaxosCommitForwarding, checkAndForwardCasIfNeeded, checkAndForwardConsensusReadIfNeeded), PaxosCommit (constructor, isTrackedKeyspaceRequiringForwarding), PaxosCommitAndPrepare, PaxosPrepare (start + isTracked field), PaxosPrepareRefresh, and PaxosState truncation acknowledgment.

Handler-side validation: Add migration state validation to four Paxos replica handlers that receive messages carrying mutations or tracked reads: PaxosCommit.RequestHandler (direct V1/V2 commits), PaxosPrepare.RequestHandler (V2 prepare with tracked read), PaxosCommitAndPrepare.RequestHandler (combined commit+prepare), and PaxosPrepareRefresh.RequestHandler (refresh commits). Each uses the conditional-fetch pattern from AbstractMutationVerbHandler.checkReplicationMigration: compare the coordinator routing decision against the handler MigrationRouter result, fetch only on mismatch when the coordinator epoch is ahead (handler is behind and needs to catch up), throw CoordinatorBehindException when the coordinator epoch is behind.

Coordinator-side commit retry: Add commit-level COORDINATOR_BEHIND retry in Paxos.cas() (V2) and commitPaxos() (V1). When replicas reject a commit due to migration state mismatch, ResponseVerbHandler.maybeFetchLogs() catches up the coordinator synchronously before delivering the failure. The retry re-creates the commit with fresh MigrationRouter routing. This retries only the commit phase, not the entire prepare+propose protocol.

Stale mutation ID reconciliation: Commits saved in system.paxos may have a mutation ID from when the keyspace was tracked. When replayed after migration to untracked (via PaxosPrepareRefresh, PaxosCommitAndPrepare, sendCommit, or commitPaxos), the stale ID must be stripped to avoid Keyspace.apply() rejecting the mutation. Uses Commit.withMutationId() to reconcile in all four replay paths.

Forward handlers: Forwarding is harmless -- the receiving replica re-executes the full CAS/read with its own fresh routing decisions, so no migration validation is needed at the forward boundary itself. Removed the "reject if keyspace not tracked" guards from CasForwardHandler and ConsensusReadForwardHandler (forwarding is now valid in either direction). Replaced unconditional fetchLogFromPeerOrCMS with the conditional-fetch pattern in Paxos2CommitForwardHandler, PaxosCommitForwardHandler, PrepareRefreshForwardHandler, and PaxosCommitAndPrepare.RequestHandler (unconditional fetch added unnecessary latency on the no-mismatch case).

PaxosCommit failure tracking: Added super.onFailure() call to PaxosCommit.onFailure() so FailureRecordingCallback.failureResponses is populated, enabling failureReasonsAsMap() to return actual failure reasons. This was required for the V2 commit retry to detect COORDINATOR_BEHIND in MaybeFailure.failures.

PaxosCommit hint suppression: Tracked mutations must not be written as hints because hint replay routes through Keyspace.applyInternalTracked() based on the mutation ID presence, which fails after migration to untracked. Guard submitHint with !isTracked() -- tracked mutations use MutationTrackingService for retries, not the hint system.

MigrationRouter null safety: Replace getKeyspaceMetadata() (throws NoSuchElementException on missing keyspace) with
maybeGetKeyspaceMetadata().orElse(null) at four call sites so the existing null guards actually protect against concurrent keyspace drops.

During mutation tracking migration (tracked <-> untracked), the per-range migration state must be consulted for every routing decision. Before this change, all Paxos V1 and V2 paths used the static keyspace-level replicationType().isTracked() check, which does not reflect per-range migration state and produces incorrect routing decisions during migration. Coordinator-side routing: Replace all static isTracked() checks with MigrationRouter calls across StorageProxy (commitPaxos, sendCommit, isTrackedKeyspaceRequiringPaxosCommitForwarding, checkAndForwardCasIfNeeded, checkAndForwardConsensusReadIfNeeded), PaxosCommit (constructor, isTrackedKeyspaceRequiringForwarding), PaxosCommitAndPrepare, PaxosPrepare (start + isTracked field), PaxosPrepareRefresh, and PaxosState truncation acknowledgment. Handler-side validation: Add migration state validation to four Paxos replica handlers that receive messages carrying mutations or tracked reads: PaxosCommit.RequestHandler (direct V1/V2 commits), PaxosPrepare.RequestHandler (V2 prepare with tracked read), PaxosCommitAndPrepare.RequestHandler (combined commit+prepare), and PaxosPrepareRefresh.RequestHandler (refresh commits). Each uses the conditional-fetch pattern from AbstractMutationVerbHandler.checkReplicationMigration: compare the coordinator routing decision against the handler MigrationRouter result, fetch only on mismatch when the coordinator epoch is ahead (handler is behind and needs to catch up), throw CoordinatorBehindException when the coordinator epoch is behind. Coordinator-side commit retry: Add commit-level COORDINATOR_BEHIND retry in Paxos.cas() (V2) and commitPaxos() (V1). When replicas reject a commit due to migration state mismatch, ResponseVerbHandler.maybeFetchLogs() catches up the coordinator synchronously before delivering the failure. The retry re-creates the commit with fresh MigrationRouter routing. This retries only the commit phase, not the entire prepare+propose protocol. Stale mutation ID reconciliation: Commits saved in system.paxos may have a mutation ID from when the keyspace was tracked. When replayed after migration to untracked (via PaxosPrepareRefresh, PaxosCommitAndPrepare, sendCommit, or commitPaxos), the stale ID must be stripped to avoid Keyspace.apply() rejecting the mutation. Uses Commit.withMutationId() to reconcile in all four replay paths. Forward handlers: Forwarding is harmless -- the receiving replica re-executes the full CAS/read with its own fresh routing decisions, so no migration validation is needed at the forward boundary itself. Removed the "reject if keyspace not tracked" guards from CasForwardHandler and ConsensusReadForwardHandler (forwarding is now valid in either direction). Replaced unconditional fetchLogFromPeerOrCMS with the conditional-fetch pattern in Paxos2CommitForwardHandler, PaxosCommitForwardHandler, PrepareRefreshForwardHandler, and PaxosCommitAndPrepare.RequestHandler (unconditional fetch added unnecessary latency on the no-mismatch case). PaxosCommit failure tracking: Added super.onFailure() call to PaxosCommit.onFailure() so FailureRecordingCallback.failureResponses is populated, enabling failureReasonsAsMap() to return actual failure reasons. This was required for the V2 commit retry to detect COORDINATOR_BEHIND in MaybeFailure.failures. PaxosCommit hint suppression: Tracked mutations must not be written as hints because hint replay routes through Keyspace.applyInternalTracked() based on the mutation ID presence, which fails after migration to untracked. Guard submitHint with !isTracked() -- tracked mutations use MutationTrackingService for retries, not the hint system. MigrationRouter null safety: Replace getKeyspaceMetadata() (throws NoSuchElementException on missing keyspace) with maybeGetKeyspaceMetadata().orElse(null) at four call sites so the existing null guards actually protect against concurrent keyspace drops.

bdeggleston

There’s a lot of case specific tests, but there are a few gaps (I’m not sure commit forwarding in the untracked->tracked migration case exists) and a lot of code (1200 line paxosv2 test, for instance). These tests seem like they’d benefit from parameterization of keyspace migration permutations and epoch ahead / behind state so you’re testing a matrix of operations against all possible migration states. That would make it very clear that all of these paths in all of these messages are being properly exercised / tested

bdeggleston · 2026-05-14T23:04:03Z

+        }
+
+        // CAS works with untracked read (correct routing during migration)
+        Object[][] result = cluster.coordinator(1).execute("SELECT * FROM " + ks + ".tbl WHERE k = " + KEY,


We should be intercepting the prepare messages here and confirming they're doing tracked reads. This is just confirming that the migration router says the right thing and the read completes

bdeggleston · 2026-05-14T23:08:14Z

+                              .expect(2)
+                              .start())
+        {
+            Object[][] result = cluster.coordinator(1).execute("INSERT INTO " + ks + ".tbl (k, v) VALUES (" + KEY + ", 42) IF NOT EXISTS",


same thing here, let's take a look at prepare and confirm the right messages are being sent

bdeggleston · 2026-05-18T19:04:38Z

+                                                                                              message.epoch());
+
+            if (message.payload.read != null)
+                MigrationRouter.checkPaxosPrepareReadMigration(metadata, message, message.from(), message.payload.read);


this doesn't seem to be handled on the prepare callback from what I can tell. If the coordinator gets a coordinator behind exception I think we'll just timeout? I think this could be handled by inspecting the failure reasons in the MAYBE_FAILURE case and breaking out of the switch (and retrying) if we're behind?

aweisberg requested a review from bdeggleston May 13, 2026 16:14

aweisberg force-pushed the 21322 branch 3 times, most recently from f81387a to 636c477 Compare May 14, 2026 16:55

aweisberg force-pushed the 21322 branch from 636c477 to 22df0fc Compare May 14, 2026 17:13

bdeggleston requested changes May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21322: Paxos support for live migration to mutation tracking#4806

CASSANDRA-21322: Paxos support for live migration to mutation tracking#4806
aweisberg wants to merge 1 commit into
apache:cep-45-mutation-trackingfrom
aweisberg:21322

aweisberg commented May 13, 2026

Uh oh!

bdeggleston left a comment

Uh oh!

bdeggleston May 14, 2026

Uh oh!

bdeggleston May 14, 2026

Uh oh!

bdeggleston May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aweisberg commented May 13, 2026

Uh oh!

bdeggleston left a comment

Choose a reason for hiding this comment

Uh oh!

bdeggleston May 14, 2026

Choose a reason for hiding this comment

Uh oh!

bdeggleston May 14, 2026

Choose a reason for hiding this comment

Uh oh!

bdeggleston May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants