Skip to content

fix(satellite): never lose the day0 first-activation mkfs (BUG-028)#147

Merged
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/day0-mkfs-race
Jun 12, 2026
Merged

fix(satellite): never lose the day0 first-activation mkfs (BUG-028)#147
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/day0-mkfs-race

Conversation

@kvaps

Copy link
Copy Markdown
Member

What

A fresh volume with FileSystem/Type set could intermittently (~2 in 5) end up with NO filesystem forever: the day0 skip-initial-sync race let an external promoter (drbd-reactor on the piraeus RWX path) reach the device before the first-activation mkfs, after which the volume was permanently wedged in a promote -> fsck-fail -> demote loop. This surfaced in CI as the recurring "rwx-ganesha flake".

Root cause chain

  1. Day0 skip-initial-sync brings both diskful replicas Connected+UpToDate within ~2s; the mkfs-election winner reaches its first activation after that.
  2. The force-promote kernel veto saw peer-disk:UpToDate and could not distinguish an empty day0 sibling from real data -> first-activation mkfs silently skipped.
  3. Each external promote/demote of the empty volume bumped the DRBD current-UUID past the computed day0 GI without writing data.
  4. The controller treated CurrentGI != day0 as proof of data -> false Initialized latch -> dispatcher granted no auto-primary -> the mkfs retry path was permanently dead.

Fix

  • Day0-aware veto bypass: force-promote for first-activation mkfs proceeds when every connected peer is provably a day0-empty sibling of the same never-initialized generation (belt) and the local volume carries no filesystem signature (braces). A peer with real data still vetoes mkfs unconditionally.
  • Latch-free mkfs retry: the retry no longer depends on the auto-primary election / Initialized latch — it fires when a volume provably lacks its filesystem while FileSystem/Type demands one, and is safe against a foreign Primary briefly holding the device (waits for Secondary).
  • Probe hardening: mkfs-safety probes unblocked when wedged by a stale drbd meta signature or CRD lag.

Validation (live stand, raw logs on the dev host)

  • Statistical: 12/12 consecutive rwx-ganesha PASS on the final code (pre-fix failure rate was 1-in-4); two earlier validation loops caught residual races and the fix was iterated until clean. Logs: /var/lib/blockstor/work/dev/bug028-fix-validation-20260612-060631/.
  • Deterministic: forced-race run (winner satellite killed mid-create) PASS — both fix legs observed firing in product logs (force-promote veto bypassed — every connected peer is a day0-empty sibling, latch-free mkfs retry). Logs: /var/lib/blockstor/work/dev/bug028-force-race-062248/.
  • No regressions: cli-matrix/r-full-lifecycle rc=0 and replay/file-thin-day0-skip-no-initial-sync PASS on the same build (day0 skip-initial-sync contract intact); go test ./..., targeted -race, and lint green.

Safety argument

mkfs is only ever performed when (a) all connected peers are day0-empty siblings by blockstor's own generation bookkeeping AND (b) no filesystem signature exists locally. Any peer with a non-day0 generation or any local fs signature refuses mkfs exactly as before. L1 regression tests cover the race chain, the false-latch retry, and the real-data refusal counter-case.

Andrei Kvapil (kvaps) and others added 4 commits June 12, 2026 07:49
Authorises a promote+mkfs+demote retry without the dispatcher's
auto-primary blessing, but only when the kernel proves it safe:
every replica Secondary, local volumes UpToDate, and every connected
peer-device either a lock-step UpToDate sibling or an intentional
diskless witness. A disconnected or non-UpToDate diskful peer (a
potential offline data holder) refuses, as does a foreign Primary
(external promoters such as drbd-reactor simply defer the retry).

Groundwork for the BUG-028 day0/mkfs race fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
A fresh RWX RD (FileSystem/Type on the RD) intermittently came up
with no filesystem, forever. Two coupled failures:

1. Day0 race: skip-initial-sync brings both diskful replicas
   Connected+UpToDate at the shared day0 GI before the elected mkfs
   winner reaches finishDRBDApply. The Bug 342 force-promote kernel
   veto cannot tell an empty day0 sibling from a real data peer, so
   the one-and-only first-activation mkfs was silently skipped.
2. False-latch terminal state: an external promoter (drbd-reactor)
   then bumps the current-UUID past day0 without writing data, the
   controller latches RD.Spec.Initialized, the dispatcher drops the
   auto-primary election, and the Bug-311 mkfs retry (gated solely
   on autoPrimaryReplica) goes permanently dead.

Fix, both sides, evidence-gated:

- day0EmptyMkfsBypass: the veto may be bypassed only when the
  controller-persisted Spec.SkipInitialSync proves a never-
  initialized generation, no proven data peer exists, the local
  metadata current-UUID still equals the deterministic day0 GI
  (DRBD only lets replicas sit Connected+UpToDate in the same data
  generation, so this proves every connected peer is a day0-empty
  sibling), and no volume carries a filesystem signature.
- shouldRetryAutoMkfs: the Bug-311 retry now also fires without the
  auto-primary election, via the deterministic lowest-diskful-
  node-id winner + SafeForMkfsRetryPromote kernel proof + blkid
  fs-absence probe, throttled, deferring while a foreign Primary
  holds the device.

Any missing evidence refuses both paths, preserving the Bug 342/356
relocate and respawn-StandAlone protections and the v0.1.11 day0
skip-initial-sync contract.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…and CRD lag

Stand forensics on the loop-backed ganesha scenario showed both new
BUG-028 paths permanently refusing on real hardware:

1. blkid on the BACKING device of an internal-metadata DRBD volume
   always reports TYPE=drbd (libblkid recognises the meta superblock
   at the device tail), so the naive any-TYPE fs-absence probe read
   every never-formatted volume as populated. New
   backingHasUserFilesystem treats only a non-drbd signature as user
   data; the post-promote probe on the DRBD device (which hides the
   metadata) remains the authoritative double-mkfs guard.
2. The dispatcher's PeerHasData conservatively counts an UpToDate
   day0 sibling whose CurrentGI backfill has not landed yet as
   data-bearing; refusing the bypass on it permanently cost the
   one-shot first-activation mkfs. The bypass now uses kernel
   coverage (Day0SiblingSetConnected: no Primary anywhere, local
   UpToDate, every connected peer-device UpToDate/Diskless, an
   un-handshaken peer tolerated only when it is a configured
   diskless witness) which, combined with the local-GI==day0 proof,
   strictly supersedes the CRD signal: every state PeerHasData
   correctly protects is still refused.

Also the latch-free retry now requires only ONE volume to be missing
its filesystem (volumes that already carry one are adopted untouched
by the per-volume blkid probe in the mkfs runner — the Bug 311
partial-mkfs shape).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ambiguity

The bypass cannot distinguish a respawned replica joining a
data-bearing-but-still-day0 survivor (a volume whose entire write
history happened fully connected never mints a new current-UUID)
from a fresh day0 sibling. This is the same ambiguity the day0 seed
path documents and the same shape the pre-existing Bug-311 retry
already had; production auto-mkfs topologies (RWX with a diskless
witness) are immune because the first consumer promote mints a UUID
and latches the RD. Documented as accepted residual risk.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 51 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 981feeb5-cc11-400b-8ff6-5edebfbbe8b9

📥 Commits

Reviewing files that changed from the base of the PR and between c54be3b and a5ed96e.

📒 Files selected for processing (4)
  • pkg/drbd/drbdadm.go
  • pkg/drbd/safe_for_mkfs_retry_promote_test.go
  • pkg/satellite/reconciler.go
  • pkg/satellite/reconciler_bug028_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/day0-mkfs-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses BUG-028 by introducing a day0-empty mkfs bypass and a latch-free mkfs retry mechanism. It adds SafeForMkfsRetryPromote and Day0SiblingSetConnected to pkg/drbd/drbdadm.go to safely probe the DRBD status, and integrates these checks into the satellite reconciler. Additionally, a backing-device filesystem probe is added to ignore the DRBD metadata signature. A review comment suggests using a less-than-or-equal comparison for peer node IDs in isLowestDiskfulNodeID to defensively prevent multiple nodes from simultaneously attempting to promote/mkfs in the event of duplicate node IDs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2980 to +2983
peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"])
if peerErr != nil || peerID < selfID {
return false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in how node ID uniqueness/equality is handled between isLowestDiskfulNodeID and isLateAddWinner.

In isLateAddWinner, the comparison is *peer.NodeID <= localID (line 3379), which correctly disqualifies the local node if a peer shares the same node ID (defensive check against duplicate node IDs). However, isLowestDiskfulNodeID uses peerID < selfID (line 2981). If a duplicate node ID situation ever occurs, both nodes could evaluate isLowestDiskfulNodeID as true and attempt to promote/mkfs simultaneously, potentially leading to a split-brain or conflict.

Using <= here makes the check more robust and consistent with isLateAddWinner.

Suggested change
peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"])
if peerErr != nil || peerID < selfID {
return false
}
peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"])
if peerErr != nil || peerID <= selfID {
return false
}

@kvaps Andrei Kvapil (kvaps) merged commit 960f414 into main Jun 12, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant