Skip to content

fix(controller): auto-tiebreaker witness must honour AutoplaceTarget=false (BUG-040)#154

Merged
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/bug-040-autoplace-target-witness
Jun 13, 2026
Merged

fix(controller): auto-tiebreaker witness must honour AutoplaceTarget=false (BUG-040)#154
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/bug-040-autoplace-target-witness

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 13, 2026

Copy link
Copy Markdown
Member

What

Two issues surfaced by the release-gate stand sweep, in the auto-tiebreaker placement path.

Product fix — witness placement + phantom-quorum (correctness + availability)

The tiebreaker witness picker only consulted the EVICTED/LOST flag set, so the auto-witness — an automatic placement — could land on a node the operator had drained with AutoplaceTarget=false, violating the maintenance-drain contract the placer has honoured since corner F4. Upstream LINSTOR routes tiebreaker selection through the autoplacer, which honours the prop, so upstream never pins a witness to a drained node.

Fix: exclude AutoplaceTarget=false nodes from the witness candidate set and from the pre-create authoritative-reader probe (isAutoplaceExcludedNode, mirroring placer.autoplaceExcludedNodes; only a parseable false excludes — true/unset/typo stay eligible, the same fail-open parse the placer uses). When no eligible spare remains, the witness is not created and createWitness now returns a created bool so the caller's quorum computation reflects the real replica set instead of a phantom witness — resolving to quorum=off for a 2-voter RD. Pre-fix this branch stamped quorum=majority on a 2-voter RD, arming a both-halves-freeze on partition.

Harness fix — node-lifecycle cli-matrix cells

A family of cells asserted against the wrong wire contract and could never pass (these produced the apparent "replica overshoot" in the sweep — they counted total Resource rows including the legal auto-tiebreaker witness, not diskful count):

  • n-d-evicted-rejected / n-evacuate-prunes-source — gate on diskful count (wait_diskful_count), the convention the rest of the suite uses; evacuate spare-pick promotes the parked witness in place.
  • n-evict-tiebreaker-no-shufflelinstor node evict is not a verb in linstor-client 1.27.1 (died in argparse, no REST call); drain via node evacuate which stamps the same EVICTED flag.
  • n-rst-recreates-tiebreaker — poll the r l view instead of one-shot reading the informer-lagged cache.
  • auto-diskful-evicted-node — rebuilt on an achievable shape (place_count-2 RG, user-diskless promotion candidate); pre-fix demanded 3 healthy diskful after evacuating 1 of 3 workers (impossible) and parked the RD in DfltRscGrp whose empty place_count skips the auto-diskful controller.
  • bug-278-skipdisk-autoclear — stamp SkipDisk via server-side apply under the observer field manager (kubectl merge-patch put the key under the wrong field manager, which the SSA release path correctly refuses to clear).
  • multi-volume-late-vd-create — widen initial-sync budget (dump showed all replicas UpToDate just past the old deadline under sweep load).

Tests

  • L1: ensure_tiebreaker_autoplace_target_bug040_test.go (witness excludes AutoplaceTarget=false; no phantom witness in quorum when no spare) + rg_rebalance_bug040_constraints_test.go.
  • Stand validation pending in this PR's CI lanes + the release-gate re-sweep.

Summary by CodeRabbit

  • New Features

    • Added support for AutoplaceTarget property to exclude nodes from witness/tiebreaker placement.
  • Bug Fixes

    • Fixed phantom witness creation when no eligible placement locations exist.
    • Improved witness placement behavior on drained and evicted nodes.
    • Enhanced replica rebalancing to respect autoplace exclusions.
  • Tests

    • Added regression test coverage for witness placement with autoplace exclusions.
    • Added resource group rebalance tests for constraint validation.
    • Updated e2e tests for improved diskful replica tracking.

Andrei Kvapil (kvaps) and others added 2 commits June 13, 2026 04:50
…false (BUG-040)

The tiebreaker picker only consulted the EVICTED/LOST flag set, so the
auto-witness — an automatic placement — landed on a node the operator
had drained with AutoplaceTarget=false, violating the maintenance-drain
contract the placer has honoured since corner F4. Upstream LINSTOR
routes tiebreaker selection through the autoplacer, which honours the
prop, so upstream never pins a witness to a drained node.

Exclude AutoplaceTarget=false nodes from the witness candidate set and
from the pre-create authoritative-reader probe. When no eligible spare
remains the witness is not created and the quorum computation now sees
the real replica set instead of a phantom witness, resolving to
quorum=off for a 2-voter RD (pre-fix it stamped majority, arming a
both-halves freeze on partition).

Existing witnesses survive a later AutoplaceTarget flip — the prop
gates new placements only, mirroring the placer's no-migration
semantic.

Also pin the rebalance-refill wiring: the refill never lands on
AutoplaceTarget=false or EVICTED nodes and never exceeds the RG
place_count.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…40 sweep

The release-gate sweep surfaced a family of cells whose assertions
never matched the wire contract:

- n-d-evicted-rejected / n-evacuate-prunes-source counted total
  Resource rows after autoplace=2, racing the legal auto-tiebreaker
  witness ("never reached count=2 (last=3)"). Add wait_diskful_count
  and gate on the diskful count, the convention the rest of the suite
  already follows. The evacuate cell also picked its spare via
  "no Resource CRD" — impossible with the witness parked there; the
  replacement legitimately promotes that witness in place.
- n-evict-tiebreaker-no-shuffle invoked 'linstor node evict', a verb
  linstor-client 1.27.1 does not have — the CLI died in argparse with
  no REST call and the cell could never pass. Drain via node evacuate,
  which stamps the same EVICTED flag the cell asserts on.
- n-rst-recreates-tiebreaker one-shot-checked the 'r l' wire view the
  instant the witness CRD appeared; the apiserver list view reads its
  informer cache and can lag the write. Poll briefly.
- auto-diskful-evicted-node demanded 3 healthy diskful after evacuating
  one of 3 workers (topologically impossible) and parked the RD in
  DfltRscGrp whose empty place_count makes the auto-diskful controller
  skip it. Rebuild the cell on an achievable shape: place_count-2 RG,
  user-diskless promotion candidate, evacuate one diskful, assert the
  deficit repairs onto the healthy spare and never onto the evicted
  node.
- bug-278-skipdisk-autoclear stamped SkipDisk via kubectl merge-patch,
  so the key belonged to the kubectl field manager — which the SSA
  release path correctly refuses to clear (operator-set SkipDisk must
  survive). Stamp via server-side apply under the observer's field
  manager so the cell exercises the actual defensive-stamp contract.
- multi-volume-late-vd-create: widen the initial-sync budget; the
  failure dump showed all replicas UpToDate moments after the old
  deadline under sweep load.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements BUG-040, a fix for witness (tie-breaker) reconciliation to respect AutoplaceTarget=false on nodes that opt out of autoplace targeting. The core logic refactors createWitness to report whether a witness was actually created and filters witness candidates and refill scheduling to exclude autoplace-excluded nodes. Three cohorts of unit tests validate the behavior, and E2E tests are updated to use diskful-replica semantics and verify evacuation/drain scenarios.

Changes

BUG-040 Core Logic: Witness Placement Respect AutoplaceTarget

Layer / File(s) Summary
Witness decision and creation signature
internal/controller/resourcedefinition_controller.go
createWitness changes from returning error to (bool, error) to report whether a witness was created. applyWitnessDecision checks this return value and avoids phantom quorum voters when creation is deferred. Return paths explicitly handle no-candidate, node-drained/missing, and error cases.
Node filtering and drain evaluation for autoplace exclusion
internal/controller/resourcedefinition_controller.go
Witness candidate filtering skips AutoplaceTarget=false nodes via new isAutoplaceExcludedNode helper. Node probing in both store and APIReader paths treats autoplace-excluded nodes as drained for witness candidacy. New nodeCRDAutoplaceExcluded helper reads the AutoplaceTarget property from Node CRD spec.Props.

Unit Tests: Witness Placement and Refill Behavior

Layer / File(s) Summary
EnsureTiebreaker regression tests for witness placement on autoplace-excluded nodes
internal/controller/ensure_tiebreaker_autoplace_target_bug040_test.go
Three test cases validate witnesses respect AutoplaceTarget=false: witness never lands on excluded nodes (quorum=off), witness lands when autoplace-target is not false (parameterized over true/garbage/unset), and existing witnesses survive addition of exclusion (quorum=majority).
RG rebalance refill tests for autoplace exclusion and eviction
internal/controller/rg_rebalance_bug040_constraints_test.go
Three test cases validate refill behavior: never schedules on excluded nodes and leaves deficit unfilled, respects PlaceCount without removing pre-existing replicas, and skips EVICTED nodes while filling deficit on healthy spares. Includes reusable fixture seeders and reconciler runner.

E2E Tests: Diskful Replica Semantics and Witness Behavior Verification

Layer / File(s) Summary
E2E helper for diskful replica count polling
tests/e2e/cli-matrix/lib.sh
New wait_diskful_count helper polls until diskful replica count (excluding diskless and tie-breaker rows) matches expected value, supporting timeout-bounded verification with diagnostic output on failure.
Auto-diskful evicted-node test: refill and witness behavior under evacuation
tests/e2e/cli-matrix/auto-diskful-evicted-node.sh
Major update implementing BUG-040 contract: creates RG with place-count 2, provisions diskful on two healthy nodes and diskless candidate on spare, evacuates one healthy node into EVICTED, validates spare becomes diskful and evicted node ends with zero diskful. Adds defensive flag check to ensure healthy diskful is not demoted during repair.
E2E test updates to use diskful replica semantics and witness polling
tests/e2e/cli-matrix/n-evacuate-prunes-source.sh, n-d-evicted-rejected.sh, n-evict-tiebreaker-no-shuffle.sh, n-rst-recreates-tiebreaker.sh, bug-278-skipdisk-autoclear-after-reattach.sh, multi-volume-late-vd-create.sh; tests/operator-harness/replay/n-autoplace-target-excludes.yaml
Multiple e2e tests replace generic replica-count polling with diskful-count polling and adjust spare node selection to account for auto-witness placement. Evacuation replacement readiness checks poll diskful nodes instead of checking resource CRD existence. Witness row-count assertions use polling windows to avoid flakiness. Operator-harness replay scenario assertion for autoplace-excluded node now expects zero resources instead of diskless replica. Timeouts and SSA semantics are aligned with infrastructure expectations.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • cozystack/blockstor#129: Both PRs modify the controller's TIE_BREAKER (auto-witness) reconciliation/EnsureTiebreaker logic in resourcedefinition_controller.go—one changes when the witness should exist (reaped below 2 diskful), and the other changes where/under what conditions it's created (skipping autoplace-excluded candidates / avoiding phantom placement).
  • cozystack/blockstor#36: Both PRs modify ResourceDefinitionReconciler.ensureTiebreaker/witness reconciliation in internal/controller/resourcedefinition_controller.go, but for different Bug-040 vs Bug-338 witness placement/removal behaviors.
  • cozystack/blockstor#109: Both PRs modify the same F4 replay scenario (tests/operator-harness/replay/n-autoplace-target-excludes.yaml) around AutoplaceTarget=false behavior for the drained node's witness/resource expectation (one switches to resource_absent, the other to replica_diskless).

Poem

🐰 A witness now respects the nodes that say "no thanks,"
Excluded from placement, they're struck from the ranks—
When quorum's uncertain and spare nodes resist,
We phantom-proof placement with a boolean twist! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: fixing the auto-tiebreaker witness to respect AutoplaceTarget=false, addressing BUG-040.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bug-040-autoplace-target-witness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses BUG-040 by ensuring that auto-tiebreaker witnesses are not placed on nodes marked with AutoplaceTarget=false (drained nodes). When no eligible spare nodes are available, the controller now correctly avoids creating a witness and falls back to quorum=off instead of counting a phantom voter. The changes also include comprehensive unit and end-to-end test updates to validate these constraints, fix existing test bugs (such as incorrect CLI commands and invalid topologies), and reduce test flakes through improved polling helpers and increased timeouts. No review comments were provided, so there is no feedback to evaluate.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/e2e/cli-matrix/multi-volume-late-vd-create.sh (1)

72-93: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep timeout diagnostics consistent with the new 240s budget.

Line 92 still reports within 120s even though the wait deadline was extended to 240s, which can mislead timeout triage.

Suggested fix
-if [[ "$all_up" != "true" ]]; then
-    echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 120s" >&2
+if [[ "$all_up" != "true" ]]; then
+    echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 240s" >&2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/multi-volume-late-vd-create.sh` around lines 72 - 93,
Update the timeout diagnostic string to match the new 240s deadline: change the
failing message that currently reads "within 120s" to "within 240s" (the code
around variable names deadline and all_up and the echo failing block should be
updated) so the error output reflects the extended wait budget.
internal/controller/resourcedefinition_controller.go (1)

839-855: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

createWitness can report created=true after rollback removed the witness.

Line 854 returns true unconditionally even after rollbackWitnessIfRDGone / rollbackWitnessIfNodeGone. That breaks the new “real post-write witness set” contract and can reintroduce phantom quorum majority for a 2-voter RD.

🛠️ Suggested fix
  err = r.Store.Resources().Create(ctx, &newWitness)
  if err != nil && !stderrors.Is(err, store.ErrAlreadyExists) && !alreadyExists(err) {
    return false, err
  }

  r.rollbackWitnessIfRDGone(ctx, rd.Name, tiebreakerNode)
  r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode)

- return true, nil
+ live, getErr := r.Store.Resources().Get(ctx, rd.Name, tiebreakerNode)
+ if getErr != nil {
+   if stderrors.Is(getErr, store.ErrNotFound) {
+     return false, nil
+   }
+   return false, getErr
+ }
+
+ if !slices.Contains(live.Flags, apiv1.ResourceFlagTieBreaker) {
+   return false, nil
+ }
+
+ return true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/resourcedefinition_controller.go` around lines 839 - 855,
createWitness currently returns created=true unconditionally even if
rollbackWitnessIfRDGone or rollbackWitnessIfNodeGone removed the witness; fix by
re-probing the store after those rollback calls and return created=false when
the witness is no longer present. Specifically, inside createWitness, after
calling r.rollbackWitnessIfRDGone(ctx, rd.Name, tiebreakerNode) and
r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode), perform a get/list
via r.Store.Resources().Get or List for the witness (using rd.Name and
tiebreakerNode as identifiers) and if the probe shows the witness is absent,
return false, nil; otherwise return true, nil as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/controller/resourcedefinition_controller.go`:
- Around line 839-855: createWitness currently returns created=true
unconditionally even if rollbackWitnessIfRDGone or rollbackWitnessIfNodeGone
removed the witness; fix by re-probing the store after those rollback calls and
return created=false when the witness is no longer present. Specifically, inside
createWitness, after calling r.rollbackWitnessIfRDGone(ctx, rd.Name,
tiebreakerNode) and r.rollbackWitnessIfNodeGone(ctx, rd.Name, tiebreakerNode),
perform a get/list via r.Store.Resources().Get or List for the witness (using
rd.Name and tiebreakerNode as identifiers) and if the probe shows the witness is
absent, return false, nil; otherwise return true, nil as before.

In `@tests/e2e/cli-matrix/multi-volume-late-vd-create.sh`:
- Around line 72-93: Update the timeout diagnostic string to match the new 240s
deadline: change the failing message that currently reads "within 120s" to
"within 240s" (the code around variable names deadline and all_up and the echo
failing block should be updated) so the error output reflects the extended wait
budget.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b6f7c5d-9d9d-469b-ab4e-226ca6e441b7

📥 Commits

Reviewing files that changed from the base of the PR and between 7389580 and 19978df.

📒 Files selected for processing (12)
  • internal/controller/ensure_tiebreaker_autoplace_target_bug040_test.go
  • internal/controller/resourcedefinition_controller.go
  • internal/controller/rg_rebalance_bug040_constraints_test.go
  • tests/e2e/cli-matrix/auto-diskful-evicted-node.sh
  • tests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.sh
  • tests/e2e/cli-matrix/lib.sh
  • tests/e2e/cli-matrix/multi-volume-late-vd-create.sh
  • tests/e2e/cli-matrix/n-d-evicted-rejected.sh
  • tests/e2e/cli-matrix/n-evacuate-prunes-source.sh
  • tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh
  • tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh
  • tests/operator-harness/replay/n-autoplace-target-excludes.yaml

@kvaps Andrei Kvapil (kvaps) merged commit b42805d into main Jun 13, 2026
15 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the fix/bug-040-autoplace-target-witness branch June 13, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant