fix(e2e): resolve DRBD devices via drbdadm sh-dev in remaining cli-matrix cells#141
Conversation
…trix cells Migrate the 9 remaining cli-matrix cells that still resolved DRBD devices through the /dev/drbd/by-res symlink (readlink or direct dd) to the shared resolve_drbd_device helper from lib.sh, the same way the snap-restore cell was fixed. The by-res symlink is not reliably present in the satellite mount namespace, so those resolution sites fail or silently no-op on the stand. Assertions are unchanged; only the resolver mechanics moved. The last-resort /dev/drbd* minor-enumeration fallbacks are kept where they already existed. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Warning Review limit reached
More reviews will be available in 59 minutes and 57 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request replaces unreliable /dev/drbd/by-res symlink lookups with a more portable resolver (resolve_drbd_device using drbdadm sh-dev) across multiple end-to-end test scripts. The review feedback highlights several instances where a failure to resolve the DRBD device path could lead to syntax errors, silent background failures, or hangs during dd operations. It is recommended to add explicit validation checks to fail early with clear error messages if the device path remains empty.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| on_node "$N1" bash -c " | ||
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| dev='$dev' | ||
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, the background writer dd command will fail immediately, causing the background loop to exit silently. This means the test will run without any active background writes, defeating the purpose of the concurrency test. Adding an explicit check before backgrounding ensures the writer launch fails visibly.
| on_node "$N1" bash -c " | |
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| dev='$dev' | |
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| on_node "$N1" bash -c " | |
| dev='$dev' | |
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| [ -z \"\$dev\" ] && { echo 'no drbd device node' >&2; exit 1; } |
| on_node "$N1" bash -c " | ||
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| dev='$dev' | ||
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, the background writer dd command will fail immediately, causing the background loop to exit silently. This means the test will run without any active background writes, defeating the purpose of the concurrency test. Adding an explicit check before backgrounding ensures the writer launch fails visibly.
| on_node "$N1" bash -c " | |
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| dev='$dev' | |
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| on_node "$N1" bash -c " | |
| dev='$dev' | |
| [ -n \"\$dev\" ] || dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| [ -z \"\$dev\" ] && { echo 'no drbd device node' >&2; exit 1; } |
| on_node "$N1" bash -c " | ||
| set -e | ||
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | ||
| dev='$dev' | ||
| if [ -z \"\$dev\" ]; then | ||
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| fi |
There was a problem hiding this comment.
If the DRBD device cannot be resolved (both resolve_drbd_device and the fallback ls return empty), the script will proceed to run dd with an empty of= parameter, causing a syntax error. Adding an explicit check ensures the script fails early with a clear error message.
| on_node "$N1" bash -c " | |
| set -e | |
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| on_node "$N1" bash -c " | |
| set -e | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| if [ -z \"\$dev\" ]; then | |
| echo 'no drbd device node' >&2 | |
| exit 1 | |
| fi |
| io_out=$(on_node "$N1" bash -c " | ||
| set -e | ||
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | ||
| dev='$dev' | ||
| if [ -z \"\$dev\" ]; then | ||
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| fi |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, the dd command will fail with a syntax error, resulting in a confusing failure. Explicitly checking for an empty device path ensures a clear error message.
| io_out=$(on_node "$N1" bash -c " | |
| set -e | |
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| io_out=\$(on_node "$N1" bash -c " | |
| set -e | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| if [ -z \"\$dev\" ]; then | |
| echo 'no drbd device node' >&2 | |
| exit 1 | |
| fi |
| seed_md5_now=$(on_node "$N1" bash -c " | ||
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | ||
| dev='$dev' | ||
| if [ -z \"\$dev\" ]; then | ||
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| fi |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, dd if=\$dev will run without an if parameter, causing it to read from standard input. This can hang or return an empty stream, leading to a confusing MD5 mismatch failure instead of a clear device resolution error.
| seed_md5_now=$(on_node "$N1" bash -c " | |
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| seed_md5_now=\$(on_node "$N1" bash -c " | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| if [ -z \"\$dev\" ]; then | |
| echo 'no drbd device node' >&2 | |
| exit 1 | |
| fi |
| seed_md5_after=$(on_node "$N1" bash -c " | ||
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | ||
| dev='$dev' | ||
| if [ -z \"\$dev\" ]; then | ||
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| fi |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, dd if=\$dev will read from standard input, leading to a confusing MD5 mismatch failure. Adding an explicit check ensures the script fails early with a clear error message.
| seed_md5_after=$(on_node "$N1" bash -c " | |
| dev=\$(readlink -f /dev/drbd/by-res/${RD}/0 2>/dev/null || true) | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| seed_md5_after=\$(on_node "$N1" bash -c " | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| if [ -z \"\$dev\" ]; then | |
| echo 'no drbd device node' >&2 | |
| exit 1 | |
| fi |
| on_node "$N1" bash -c " | ||
| set -e | ||
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || true) | ||
| dev='$dev' | ||
| if [ -z \"\$dev\" ]; then | ||
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | ||
| fi |
There was a problem hiding this comment.
If the DRBD device cannot be resolved, the script will proceed to run dd with an empty of= parameter, causing a syntax error. Adding an explicit check ensures the script fails early with a clear error message.
| on_node "$N1" bash -c " | |
| set -e | |
| dev=\$(readlink -f /dev/drbd/by-res/$RD/0 2>/dev/null || true) | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| on_node "$N1" bash -c " | |
| set -e | |
| dev='$dev' | |
| if [ -z \"\$dev\" ]; then | |
| dev=\$(ls -1 /dev/drbd* 2>/dev/null | grep -vE 'by-(res|disk)' | head -1) | |
| fi | |
| if [ -z \"\$dev\" ]; then | |
| echo 'no drbd device node' >&2 | |
| exit 1 | |
| fi |
What
Migrates the 9 remaining cli-matrix cells that still resolved DRBD devices through the
/dev/drbd/by-ressymlink (eitherreadlink -for a directddagainst the symlink path) to the sharedresolve_drbd_devicehelper intests/e2e/cli-matrix/lib.sh, following the same pattern as the already-fixedsnap-restore-snapshotless-node-rejected.shcell.Cells migrated:
snap-full-lifecycle.sh— seed write + continuous writersnap-create-multiple-lifecycle.sh— 3-RD correlated writersnap-cross-node-consistency.sh— seed write + continuous writersnap-create-multiple-group-consistency.sh— 2-RD correlated writersnap-r-rst-stamps-resources.sh— marker seed + restored-replica marker readsnap-suspend-resume-isolation-u138-u52.sh—write_survivespost-snapshot write prober-activate-deactivate-lifecycle.sh— seed write + per-cycle INACTIVE write and two md5 anchor readsr-d-last-uptodate-midsync-rejected.sh— GI-bump write before the mid-sync windowr-c-over-tiebreaker-skip-sync.sh— post-tiebreaker mutation writeWhy
The
by-ressymlink is not reliably present in the satellite mount namespace on the stand, so these resolution sites either abort the cell or (for the direct-ddsites with suppressed stderr) silently skip the write that the scenario depends on.drbdadm sh-devresolves the/dev/drbdNpath without depending on the symlink — the same root cause and fix as the snap-restore cell.Scope notes
ls /dev/drbd*minor-enumeration fallback (snap-full-lifecycle,snap-cross-node-consistency,r-activate-deactivate-lifecycle) keep it as a secondary path after the helper, since it covers stands wheresh-devitself fails; the helper is now the primary resolver everywhere.ddcells the write is still best-effort (|| true), but an unresolved device is now guarded explicitly instead of dd failing against a dangling symlink path.Validation
bash -nclean on all 9 cells.shellcheck: only pre-existing findings (SC1091 lib.sh sourcing info on all cells, SC2016/SC2034 already present on main).readlink -f /dev/drbd/by-resor bare by-res reliance remains in any churn path; remainingby-resmentions are explanatory comments only.