fix(satellite): never lose the day0 first-activation mkfs (BUG-028)#147
Conversation
Authorises a promote+mkfs+demote retry without the dispatcher's auto-primary blessing, but only when the kernel proves it safe: every replica Secondary, local volumes UpToDate, and every connected peer-device either a lock-step UpToDate sibling or an intentional diskless witness. A disconnected or non-UpToDate diskful peer (a potential offline data holder) refuses, as does a foreign Primary (external promoters such as drbd-reactor simply defer the retry). Groundwork for the BUG-028 day0/mkfs race fix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
A fresh RWX RD (FileSystem/Type on the RD) intermittently came up with no filesystem, forever. Two coupled failures: 1. Day0 race: skip-initial-sync brings both diskful replicas Connected+UpToDate at the shared day0 GI before the elected mkfs winner reaches finishDRBDApply. The Bug 342 force-promote kernel veto cannot tell an empty day0 sibling from a real data peer, so the one-and-only first-activation mkfs was silently skipped. 2. False-latch terminal state: an external promoter (drbd-reactor) then bumps the current-UUID past day0 without writing data, the controller latches RD.Spec.Initialized, the dispatcher drops the auto-primary election, and the Bug-311 mkfs retry (gated solely on autoPrimaryReplica) goes permanently dead. Fix, both sides, evidence-gated: - day0EmptyMkfsBypass: the veto may be bypassed only when the controller-persisted Spec.SkipInitialSync proves a never- initialized generation, no proven data peer exists, the local metadata current-UUID still equals the deterministic day0 GI (DRBD only lets replicas sit Connected+UpToDate in the same data generation, so this proves every connected peer is a day0-empty sibling), and no volume carries a filesystem signature. - shouldRetryAutoMkfs: the Bug-311 retry now also fires without the auto-primary election, via the deterministic lowest-diskful- node-id winner + SafeForMkfsRetryPromote kernel proof + blkid fs-absence probe, throttled, deferring while a foreign Primary holds the device. Any missing evidence refuses both paths, preserving the Bug 342/356 relocate and respawn-StandAlone protections and the v0.1.11 day0 skip-initial-sync contract. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…and CRD lag Stand forensics on the loop-backed ganesha scenario showed both new BUG-028 paths permanently refusing on real hardware: 1. blkid on the BACKING device of an internal-metadata DRBD volume always reports TYPE=drbd (libblkid recognises the meta superblock at the device tail), so the naive any-TYPE fs-absence probe read every never-formatted volume as populated. New backingHasUserFilesystem treats only a non-drbd signature as user data; the post-promote probe on the DRBD device (which hides the metadata) remains the authoritative double-mkfs guard. 2. The dispatcher's PeerHasData conservatively counts an UpToDate day0 sibling whose CurrentGI backfill has not landed yet as data-bearing; refusing the bypass on it permanently cost the one-shot first-activation mkfs. The bypass now uses kernel coverage (Day0SiblingSetConnected: no Primary anywhere, local UpToDate, every connected peer-device UpToDate/Diskless, an un-handshaken peer tolerated only when it is a configured diskless witness) which, combined with the local-GI==day0 proof, strictly supersedes the CRD signal: every state PeerHasData correctly protects is still refused. Also the latch-free retry now requires only ONE volume to be missing its filesystem (volumes that already carry one are adopted untouched by the per-volume blkid probe in the mkfs runner — the Bug 311 partial-mkfs shape). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ambiguity The bypass cannot distinguish a respawned replica joining a data-bearing-but-still-day0 survivor (a volume whose entire write history happened fully connected never mints a new current-UUID) from a fresh day0 sibling. This is the same ambiguity the day0 seed path documents and the same shape the pre-existing Bug-311 retry already had; production auto-mkfs topologies (RWX with a diskless witness) are immune because the first consumer promote mints a UUID and latches the RD. Documented as accepted residual risk. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Warning Review limit reached
More reviews will be available in 25 minutes and 51 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses BUG-028 by introducing a day0-empty mkfs bypass and a latch-free mkfs retry mechanism. It adds SafeForMkfsRetryPromote and Day0SiblingSetConnected to pkg/drbd/drbdadm.go to safely probe the DRBD status, and integrates these checks into the satellite reconciler. Additionally, a backing-device filesystem probe is added to ignore the DRBD metadata signature. A review comment suggests using a less-than-or-equal comparison for peer node IDs in isLowestDiskfulNodeID to defensively prevent multiple nodes from simultaneously attempting to promote/mkfs in the event of duplicate node IDs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"]) | ||
| if peerErr != nil || peerID < selfID { | ||
| return false | ||
| } |
There was a problem hiding this comment.
There is an inconsistency in how node ID uniqueness/equality is handled between isLowestDiskfulNodeID and isLateAddWinner.
In isLateAddWinner, the comparison is *peer.NodeID <= localID (line 3379), which correctly disqualifies the local node if a peer shares the same node ID (defensive check against duplicate node IDs). However, isLowestDiskfulNodeID uses peerID < selfID (line 2981). If a duplicate node ID situation ever occurs, both nodes could evaluate isLowestDiskfulNodeID as true and attempt to promote/mkfs simultaneously, potentially leading to a split-brain or conflict.
Using <= here makes the check more robust and consistent with isLateAddWinner.
| peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"]) | |
| if peerErr != nil || peerID < selfID { | |
| return false | |
| } | |
| peerID, peerErr := strconv.Atoi(opts["peer."+peer+".node-id"]) | |
| if peerErr != nil || peerID <= selfID { | |
| return false | |
| } |
What
A fresh volume with
FileSystem/Typeset could intermittently (~2 in 5) end up with NO filesystem forever: the day0 skip-initial-sync race let an external promoter (drbd-reactor on the piraeus RWX path) reach the device before the first-activation mkfs, after which the volume was permanently wedged in a promote -> fsck-fail -> demote loop. This surfaced in CI as the recurring "rwx-ganesha flake".Root cause chain
Initializedlatch -> dispatcher granted no auto-primary -> the mkfs retry path was permanently dead.Fix
Initializedlatch — it fires when a volume provably lacks its filesystem whileFileSystem/Typedemands one, and is safe against a foreign Primary briefly holding the device (waits for Secondary).Validation (live stand, raw logs on the dev host)
/var/lib/blockstor/work/dev/bug028-fix-validation-20260612-060631/.force-promote veto bypassed — every connected peer is a day0-empty sibling,latch-free mkfs retry). Logs:/var/lib/blockstor/work/dev/bug028-force-race-062248/.cli-matrix/r-full-lifecyclerc=0 andreplay/file-thin-day0-skip-no-initial-syncPASS on the same build (day0 skip-initial-sync contract intact);go test ./..., targeted-race, and lint green.Safety argument
mkfs is only ever performed when (a) all connected peers are day0-empty siblings by blockstor's own generation bookkeeping AND (b) no filesystem signature exists locally. Any peer with a non-day0 generation or any local fs signature refuses mkfs exactly as before. L1 regression tests cover the race chain, the false-latch retry, and the real-data refusal counter-case.