Skip to content

fix(controller): close witness-reap vs redundancy-backfill race (BUG-041)#157

Merged
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/bug393-witness-reap-backfill-race
Jun 13, 2026
Merged

fix(controller): close witness-reap vs redundancy-backfill race (BUG-041)#157
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/bug393-witness-reap-backfill-race

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 13, 2026

Copy link
Copy Markdown
Member

What

Durability/availability race surfaced by the release-gate sweep (scenario replay/inactive-return-backfills-redundancy): the auto-tiebreaker witness reaper removeWitnesses did a non-atomic Get→Delete with no resourceVersion precondition, racing the placer's redundancy backfill. When an inactive replica returns and the placer promotes a Resource into a diskful backfill, the reaper — having read the row a moment earlier as a witness — could Delete it by name, clobbering the freshly-promoted replica. Net effect: redundancy is silently NOT restored (fewer diskful replicas than place_count). Reproduces back-to-back deactivate + r c --auto-place.

Fix

  • feat(store): add a conditional DeleteIfTieBreaker to ResourceStore (both InMemory and k8s backends) — deletes only if the object still carries the TIE_BREAKER flag and is still diskless, under a resourceVersion precondition, so a concurrent promotion to diskful aborts the delete.
  • fix(controller): route removeWitnesses through the conditional delete; a row that was promoted out of witness state between the reconcile read and the delete is left intact, and the backfill survives. Legitimate witness-reap paths (orphan witness, diskful≥3, BUG-040 AutoplaceTarget, ghost-on-lost-node fix(controller): ghost tiebreaker witness re-created on a just-deleted node #140) are unchanged.

Tests

  • L1: pin conditional witness delete against backfill race — injects the concurrent promotion between the reaper's read and delete (decorator store) and asserts the promoted replica survives + redundancy holds.
  • Stand validation (tiebreaker+autoplace surface — house protocol): on stand big, cli-matrix/r-full-lifecycle green and L7 replay/inactive-return-backfills-redundancy 3/3 PASS. The full ensure_tiebreaker_* / ghost / BUG-040 witness suites stay green.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed witness replica handling during concurrent backfill operations to ensure proper redundancy restoration
    • Improved replica stability to reduce unnecessary replica set changes when inactive diskful replicas are present
  • Tests

    • Added comprehensive test coverage for witness reaping and concurrent backfill scenarios

Andrei Kvapil (kvaps) and others added 4 commits June 13, 2026 11:40
Add DeleteIfTieBreaker, which deletes a replica only if it still carries
the TIE_BREAKER flag at the observed object version. The k8s store guards
the delete with a ResourceVersion + UID precondition; the in-memory store
performs the read-check-delete under its write lock. A concurrent
promotion of the same (rd, node) row to a diskful backfill replica makes
the delete a benign no-op instead of clobbering the promoted replica.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
removeWitnesses did a non-atomic Get-then-Delete: it re-read the witness
flags, but the Delete carried no version precondition, so a concurrent
backfill promotion landing in the gap between the read and the Delete was
silently deleted by name. After an inactive-replica return or node event,
the placer would promote the witness to a diskful backfill replica and the
reaper would then clobber it — redundancy was never restored (fewer
diskful replicas than place_count) and the loss was silent.

Route the reap through the new ResourceStore.DeleteIfTieBreaker so the
flag re-check and the delete are one optimistic-concurrency operation: a
racing promotion aborts the delete instead of clobbering it. Legitimate
orphan/ghost/relocate witness-reap paths are unaffected.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add an L1 controller regression that injects the concurrent backfill
promotion at the exact race point — between the reaper's observation of
the witness and the delete commit — and asserts the promoted replica
survives and redundancy holds. A companion test pins that a genuine
orphan witness (no racing promotion) is still reaped, guarding against
over-correction.

Add a DeleteIfTieBreaker contract test to the shared store suite so both
the in-memory and CRD-backed (envtest) backends are pinned in lock-step:
witness reaped, promoted diskful preserved, missing row a no-op. Update
the pkg/rest decorator fakes to satisfy the widened interface.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The inactive-return-backfills-redundancy hang has a decision-level root
cause in addition to the reaper race: with 1 active diskful + 1 INACTIVE
diskful + 1 leftover auto-tiebreaker witness, ensureTiebreaker filters
the INACTIVE replica out of the voting set, sees "1 diskful + 1 witness",
treats the witness as a genuine orphan, and reaps it. But that witness
sits on the very spare the placer's `r c --auto-place` is promoting to
restore active redundancy. Reaper and placer fight over the node and the
reconciler oscillates 1<->2 active diskful forever, so the backfill never
stabilises (active_diskful_count never reaches 2 — the 240s timeout).

Hold an existing witness whenever an INACTIVE diskful replica is present:
the cluster is degraded and a backfill is expected, so the witness is the
backfill target, not an orphan. This only preserves an already-present
witness (the create branch is unchanged, so no spurious witness grows),
and it is the forward-correct shape — when the INACTIVE replica returns
(`r activate`) the cluster is immediately at the 2-diskful + 1-witness
three-voter quorum with no recreate churn.

Add a decision-level regression that reproduces the oscillation shape and
asserts the witness survives repeated reconciles; the Bug 387 no-spurious-
witness and the ghost/orphan reap suites stay green.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b6568680-5310-4907-bbd3-6b206fff8049

📥 Commits

Reviewing files that changed from the base of the PR and between 7389580 and 15fa0c3.

📒 Files selected for processing (9)
  • internal/controller/ensure_tiebreaker_backfill_bug393_test.go
  • internal/controller/resourcedefinition_controller.go
  • pkg/rest/bug_359_mid_delete_promote_test.go
  • pkg/rest/bug_359_witness_collapse_race_test.go
  • pkg/rest/cache_invalidation_bug_124_test.go
  • pkg/store/inmemory_resource.go
  • pkg/store/k8s/resources.go
  • pkg/store/store.go
  • pkg/store/storetest/storetest.go

📝 Walkthrough

Walkthrough

PR adds a version-guarded DeleteIfTieBreaker method across the store interface and backends to prevent clobbering tie-breaker witnesses during concurrent promotion. Controller logic detects INACTIVE diskful replicas and preserves witnesses through the backfill window, with three regression tests validating the Bug 393 fix.

Changes

Witness Deletion & Backfill Race Fix

Layer / File(s) Summary
Store interface and test contract
pkg/store/store.go, pkg/store/storetest/storetest.go
DeleteIfTieBreaker contract specifies version-preconditioned deletion semantics: returns (true, nil) on deletion, (false, nil) when skipped due to concurrent promotion/state change/absence, and (false, err) for genuine errors. Store test contract validates witness reaped, promoted diskful preserved, and missing row benign.
In-memory and Kubernetes store backends
pkg/store/inmemory_resource.go, pkg/store/k8s/resources.go
In-memory backend acquires write lock and conditionally deletes when TIE_BREAKER flag exists. Kubernetes backend fetches CRD, guards delete with both ResourceVersion and UID preconditions, normalizes Conflict and NotFound as benign, and propagates other errors.
Test helper mocks and wrappers
pkg/rest/bug_359_mid_delete_promote_test.go, pkg/rest/bug_359_witness_collapse_race_test.go, pkg/rest/cache_invalidation_bug_124_test.go
Bug 359 test mocks and cache-invalidation helper implement DeleteIfTieBreaker by delegating to wrapped store, ensuring test doubles satisfy the updated interface.
Controller witness preservation and conditional removal
internal/controller/resourcedefinition_controller.go
ensureTiebreaker detects INACTIVE diskful replicas and passes this state to shouldTieBreakerExist, which preserves existing witnesses when an INACTIVE diskful is present. removeWitnesses switches to version-guarded DeleteIfTieBreaker to prevent clobbering promotion races, with detailed concurrency window documentation.
Bug 393 regression tests
internal/controller/ensure_tiebreaker_backfill_bug393_test.go
Three tests validate the Bug 393 fix: promoted backfill survives witness reaper with redundancy restored, genuine orphan still reaped, and INACTIVE topology keeps witness stable across repeated passes without churn. Custom promoteRacingStore wrapper injects promotion concurrency into delete path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cozystack/blockstor#36: Both PRs modify ensureTiebreaker and witness removal behavior in the controller—one removes orphan-witness grace, the other adds DeleteIfTieBreaker to prevent promotion clobbering.
  • cozystack/blockstor#129: Both PRs update the TIE_BREAKER witness keep/reap decision in EnsureTiebreaker—main PR adds Bug-393-specific backfill/inactive race handling, while #129 changes the baseline reaping rule below two diskful.
  • cozystack/blockstor#61: Both PRs address Bug 359 witness collapse—main PR adds DeleteIfTieBreaker store plumbing and updates Bug 359 REST test mocks, while #61 implements the Bug 359 retry behavior in pkg/rest/autoplace.go.

Poem

🐰 A tie-breaker awaits, caught mid-flight,
Preconditions lock in, versions align just right,
No more clobbered witnesses from racing promotions—
The backfill survives with concordant devotions,
While inactive replicas keep chaos at bay. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely identifies the main fix: addressing a race condition between witness reaping and redundancy backfill operations, with the specific bug reference (BUG-041).
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bug393-witness-reap-backfill-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses Bug 393, where the auto-tiebreaker witness reaper raced with the placer's redundancy backfill, causing promoted replicas to be silently deleted and preventing redundancy from being restored. To resolve this, a new atomic/conditional delete method DeleteIfTieBreaker is introduced to ResourceStore and implemented for both in-memory and Kubernetes-backed stores. This ensures that a witness is only deleted if it still carries the TIE_BREAKER flag at the observed version, preventing racing promotions from being clobbered. Additionally, the reconciler is updated to preserve an existing witness when an inactive diskful replica is present. The review feedback suggests refining the containsInactiveReplica helper function to ensure it only matches inactive replicas that are also diskful, preventing inactive diskless replicas (witnesses) from incorrectly triggering the degraded redundancy logic.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +432 to +440
func containsInactiveReplica(replicas []apiv1.Resource) bool {
for i := range replicas {
if slices.Contains(replicas[i].Flags, apiv1.ResourceFlagInactive) {
return true
}
}

return false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The helper function containsInactiveReplica is designed to detect if active redundancy is degraded by checking for deactivated replicas. However, checking only for ResourceFlagInactive can also match inactive diskless replicas (witnesses), which do not represent a degradation of active diskful redundancy.

To align precisely with the design intent (as documented in the comments), we should ensure that the inactive replica is also diskful (i.e., does not carry the ResourceFlagDiskless flag).

Suggested change
func containsInactiveReplica(replicas []apiv1.Resource) bool {
for i := range replicas {
if slices.Contains(replicas[i].Flags, apiv1.ResourceFlagInactive) {
return true
}
}
return false
}
func containsInactiveReplica(replicas []apiv1.Resource) bool {
for i := range replicas {
if slices.Contains(replicas[i].Flags, apiv1.ResourceFlagInactive) &&
!slices.Contains(replicas[i].Flags, apiv1.ResourceFlagDiskless) {
return true
}
}
return false
}

@kvaps Andrei Kvapil (kvaps) merged commit 4e1ae83 into main Jun 13, 2026
15 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the fix/bug393-witness-reap-backfill-race branch June 13, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant