fix(controller): auto-tiebreaker witness must honour AutoplaceTarget=false (BUG-040)#154
Conversation
…false (BUG-040) The tiebreaker picker only consulted the EVICTED/LOST flag set, so the auto-witness — an automatic placement — landed on a node the operator had drained with AutoplaceTarget=false, violating the maintenance-drain contract the placer has honoured since corner F4. Upstream LINSTOR routes tiebreaker selection through the autoplacer, which honours the prop, so upstream never pins a witness to a drained node. Exclude AutoplaceTarget=false nodes from the witness candidate set and from the pre-create authoritative-reader probe. When no eligible spare remains the witness is not created and the quorum computation now sees the real replica set instead of a phantom witness, resolving to quorum=off for a 2-voter RD (pre-fix it stamped majority, arming a both-halves freeze on partition). Existing witnesses survive a later AutoplaceTarget flip — the prop gates new placements only, mirroring the placer's no-migration semantic. Also pin the rebalance-refill wiring: the refill never lands on AutoplaceTarget=false or EVICTED nodes and never exceeds the RG place_count. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…40 sweep
The release-gate sweep surfaced a family of cells whose assertions
never matched the wire contract:
- n-d-evicted-rejected / n-evacuate-prunes-source counted total
Resource rows after autoplace=2, racing the legal auto-tiebreaker
witness ("never reached count=2 (last=3)"). Add wait_diskful_count
and gate on the diskful count, the convention the rest of the suite
already follows. The evacuate cell also picked its spare via
"no Resource CRD" — impossible with the witness parked there; the
replacement legitimately promotes that witness in place.
- n-evict-tiebreaker-no-shuffle invoked 'linstor node evict', a verb
linstor-client 1.27.1 does not have — the CLI died in argparse with
no REST call and the cell could never pass. Drain via node evacuate,
which stamps the same EVICTED flag the cell asserts on.
- n-rst-recreates-tiebreaker one-shot-checked the 'r l' wire view the
instant the witness CRD appeared; the apiserver list view reads its
informer cache and can lag the write. Poll briefly.
- auto-diskful-evicted-node demanded 3 healthy diskful after evacuating
one of 3 workers (topologically impossible) and parked the RD in
DfltRscGrp whose empty place_count makes the auto-diskful controller
skip it. Rebuild the cell on an achievable shape: place_count-2 RG,
user-diskless promotion candidate, evacuate one diskful, assert the
deficit repairs onto the healthy spare and never onto the evicted
node.
- bug-278-skipdisk-autoclear stamped SkipDisk via kubectl merge-patch,
so the key belonged to the kubectl field manager — which the SSA
release path correctly refuses to clear (operator-set SkipDisk must
survive). Stamp via server-side apply under the observer's field
manager so the cell exercises the actual defensive-stamp contract.
- multi-volume-late-vd-create: widen the initial-sync budget; the
failure dump showed all replicas UpToDate moments after the old
deadline under sweep load.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
📝 WalkthroughWalkthroughThis PR implements BUG-040, a fix for witness (tie-breaker) reconciliation to respect ChangesBUG-040 Core Logic: Witness Placement Respect AutoplaceTarget
Unit Tests: Witness Placement and Refill Behavior
E2E Tests: Diskful Replica Semantics and Witness Behavior Verification
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses BUG-040 by ensuring that auto-tiebreaker witnesses are not placed on nodes marked with AutoplaceTarget=false (drained nodes). When no eligible spare nodes are available, the controller now correctly avoids creating a witness and falls back to quorum=off instead of counting a phantom voter. The changes also include comprehensive unit and end-to-end test updates to validate these constraints, fix existing test bugs (such as incorrect CLI commands and invalid topologies), and reduce test flakes through improved polling helpers and increased timeouts. No review comments were provided, so there is no feedback to evaluate.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/e2e/cli-matrix/multi-volume-late-vd-create.sh (1)
72-93:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep timeout diagnostics consistent with the new 240s budget.
Line 92 still reports
within 120seven though the wait deadline was extended to 240s, which can mislead timeout triage.Suggested fix
-if [[ "$all_up" != "true" ]]; then - echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 120s" >&2 +if [[ "$all_up" != "true" ]]; then + echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 240s" >&2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/cli-matrix/multi-volume-late-vd-create.sh` around lines 72 - 93, Update the timeout diagnostic string to match the new 240s deadline: change the failing message that currently reads "within 120s" to "within 240s" (the code around variable names deadline and all_up and the echo failing block should be updated) so the error output reflects the extended wait budget.internal/controller/resourcedefinition_controller.go (1)
839-855:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
createWitnesscan reportcreated=trueafter rollback removed the witness.Line 854 returns
trueunconditionally even afterrollbackWitnessIfRDGone/rollbackWitnessIfNodeGone. That breaks the new “real post-write witness set” contract and can reintroduce phantom quorum majority for a 2-voter RD.🛠️ Suggested fix
err = r.Store.Resources().Create(ctx, &newWitness) if err != nil && !stderrors.Is(err, store.ErrAlreadyExists) && !alreadyExists(err) { return false, err } r.rollbackWitnessIfRDGone(ctx, rd.Name, tiebreakerNode) r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode) - return true, nil + live, getErr := r.Store.Resources().Get(ctx, rd.Name, tiebreakerNode) + if getErr != nil { + if stderrors.Is(getErr, store.ErrNotFound) { + return false, nil + } + return false, getErr + } + + if !slices.Contains(live.Flags, apiv1.ResourceFlagTieBreaker) { + return false, nil + } + + return true, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/resourcedefinition_controller.go` around lines 839 - 855, createWitness currently returns created=true unconditionally even if rollbackWitnessIfRDGone or rollbackWitnessIfNodeGone removed the witness; fix by re-probing the store after those rollback calls and return created=false when the witness is no longer present. Specifically, inside createWitness, after calling r.rollbackWitnessIfRDGone(ctx, rd.Name, tiebreakerNode) and r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode), perform a get/list via r.Store.Resources().Get or List for the witness (using rd.Name and tiebreakerNode as identifiers) and if the probe shows the witness is absent, return false, nil; otherwise return true, nil as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/controller/resourcedefinition_controller.go`:
- Around line 839-855: createWitness currently returns created=true
unconditionally even if rollbackWitnessIfRDGone or rollbackWitnessIfNodeGone
removed the witness; fix by re-probing the store after those rollback calls and
return created=false when the witness is no longer present. Specifically, inside
createWitness, after calling r.rollbackWitnessIfRDGone(ctx, rd.Name,
tiebreakerNode) and r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode),
perform a get/list via r.Store.Resources().Get or List for the witness (using
rd.Name and tiebreakerNode as identifiers) and if the probe shows the witness is
absent, return false, nil; otherwise return true, nil as before.
In `@tests/e2e/cli-matrix/multi-volume-late-vd-create.sh`:
- Around line 72-93: Update the timeout diagnostic string to match the new 240s
deadline: change the failing message that currently reads "within 120s" to
"within 240s" (the code around variable names deadline and all_up and the echo
failing block should be updated) so the error output reflects the extended wait
budget.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b6f7c5d-9d9d-469b-ab4e-226ca6e441b7
📒 Files selected for processing (12)
internal/controller/ensure_tiebreaker_autoplace_target_bug040_test.gointernal/controller/resourcedefinition_controller.gointernal/controller/rg_rebalance_bug040_constraints_test.gotests/e2e/cli-matrix/auto-diskful-evicted-node.shtests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.shtests/e2e/cli-matrix/lib.shtests/e2e/cli-matrix/multi-volume-late-vd-create.shtests/e2e/cli-matrix/n-d-evicted-rejected.shtests/e2e/cli-matrix/n-evacuate-prunes-source.shtests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.shtests/e2e/cli-matrix/n-rst-recreates-tiebreaker.shtests/operator-harness/replay/n-autoplace-target-excludes.yaml
What
Two issues surfaced by the release-gate stand sweep, in the auto-tiebreaker placement path.
Product fix — witness placement + phantom-quorum (correctness + availability)
The tiebreaker witness picker only consulted the EVICTED/LOST flag set, so the auto-witness — an automatic placement — could land on a node the operator had drained with
AutoplaceTarget=false, violating the maintenance-drain contract the placer has honoured since corner F4. Upstream LINSTOR routes tiebreaker selection through the autoplacer, which honours the prop, so upstream never pins a witness to a drained node.Fix: exclude
AutoplaceTarget=falsenodes from the witness candidate set and from the pre-create authoritative-reader probe (isAutoplaceExcludedNode, mirroringplacer.autoplaceExcludedNodes; only a parseablefalseexcludes — true/unset/typo stay eligible, the same fail-open parse the placer uses). When no eligible spare remains, the witness is not created andcreateWitnessnow returns acreated boolso the caller's quorum computation reflects the real replica set instead of a phantom witness — resolving toquorum=offfor a 2-voter RD. Pre-fix this branch stampedquorum=majorityon a 2-voter RD, arming a both-halves-freeze on partition.Harness fix — node-lifecycle cli-matrix cells
A family of cells asserted against the wrong wire contract and could never pass (these produced the apparent "replica overshoot" in the sweep — they counted total Resource rows including the legal auto-tiebreaker witness, not diskful count):
n-d-evicted-rejected/n-evacuate-prunes-source— gate on diskful count (wait_diskful_count), the convention the rest of the suite uses; evacuate spare-pick promotes the parked witness in place.n-evict-tiebreaker-no-shuffle—linstor node evictis not a verb in linstor-client 1.27.1 (died in argparse, no REST call); drain vianode evacuatewhich stamps the same EVICTED flag.n-rst-recreates-tiebreaker— poll ther lview instead of one-shot reading the informer-lagged cache.auto-diskful-evicted-node— rebuilt on an achievable shape (place_count-2 RG, user-diskless promotion candidate); pre-fix demanded 3 healthy diskful after evacuating 1 of 3 workers (impossible) and parked the RD in DfltRscGrp whose empty place_count skips the auto-diskful controller.bug-278-skipdisk-autoclear— stamp SkipDisk via server-side apply under the observer field manager (kubectl merge-patch put the key under the wrong field manager, which the SSA release path correctly refuses to clear).multi-volume-late-vd-create— widen initial-sync budget (dump showed all replicas UpToDate just past the old deadline under sweep load).Tests
ensure_tiebreaker_autoplace_target_bug040_test.go(witness excludes AutoplaceTarget=false; no phantom witness in quorum when no spare) +rg_rebalance_bug040_constraints_test.go.Summary by CodeRabbit
New Features
AutoplaceTargetproperty to exclude nodes from witness/tiebreaker placement.Bug Fixes
Tests