Skip to content

fix(e2e): unbreak LUKS cli-matrix cells — stdin to cryptsetup, diskful-only replica count (BUG-039)#152

Merged
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/bug-039-luks-cli-matrix-stdin
Jun 13, 2026
Merged

fix(e2e): unbreak LUKS cli-matrix cells — stdin to cryptsetup, diskful-only replica count (BUG-039)#152
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/bug-039-luks-cli-matrix-stdin

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

BUG-039 reported the LUKS data plane as broken at the release-candidate SHA: 6 of 8 luks-* cli-matrix cells failed on a live stand, including the new Secret-only flow cell from #143, with passphrase does NOT open <node>:<backing-device>.

Live debugging on a clean stand shows the product chain is correct end-to-end at the candidate SHA: linstor encryption create-passphrase -> Secret -> satellite-side injection -> LuksPassphrase wire prop -> cryptsetup luksFormat. The backing LVs are formatted with the operator's master passphrase and the passphrase opens them — once the harness actually delivers it. Both root causes were in the L6 harness, not in the product:

  1. stdin never reached cryptsetup. assert_luks_passphrase_opens piped the passphrase into on_node, which runs kubectl exec without -i, so the pipe was dropped and cryptsetup read an empty key-file ("Nothing to read on input."). The kernel-level assertion could never pass on any stand, regardless of what key the satellite formatted with; the 2>/dev/null swallowed the tell. Fixed with a stdin-forwarding on_node_stdin helper; cryptsetup stderr is now printed on the failure path.

  2. Tiebreaker counted as a replica. The clone/resize/snapshot-restore cells waited for exactly 2 Resource CRDs after --auto-place=2, but on a 3-worker stand the controller adds (and flaps) a DISKLESS TIE_BREAKER witness, so the count oscillates 2-3-2 and the equality check times out spuriously ("did not autoplace 2 replicas"). Fixed by counting diskful replicas via linstor_diskful_nodes, the convention the sibling cells already use.

No product code changes.

Validation (live stand, candidate SHA)

  • encryption-passphrase-luks-rd (Secret-only flow): green 3/3 consecutive runs
  • luks-rd-create-encrypted, luks-autoplace-encrypted, luks-resize-encrypted: green
  • Kernel-level proof captured manually: with stdin forwarded, the Secret-stored master passphrase opens the LUKS header on every replica's backing LV; without -i the same command fails with "Nothing to read on input."
  • luks-clone-encrypted / luks-snapshot-restore-encrypted now get past placement and surface the real blocker: cross-node snapshot ship fails in the clone/restore engine (zfs recv: invalid stream (bad magic number)), which is layer-independent (plain rd-clone-vd-data-plane fails identically) and tracked separately as BUG-038.
  • go test ./... clean, golangci-lint run 0 issues, shellcheck introduces no new findings

Summary by CodeRabbit

Release Notes

Tests

  • Improved encrypted storage test reliability by fixing stdin handling during containerized operations.
  • Enhanced replica-counting logic in encryption tests for more accurate multi-worker cluster scenarios.

Andrei Kvapil (kvaps) and others added 2 commits June 13, 2026 04:22
…039)

assert_luks_passphrase_opens piped the master passphrase into
on_node, but on_node runs kubectl exec without -i, so the pipe was
never forwarded and cryptsetup read an empty key-file ("Nothing to
read on input."). Every kernel-level passphrase assertion therefore
failed on every stand — reported as BUG-039 'LUKS data-plane broken'
— while the satellite had in fact formatted the backing device with
the correct master passphrase (verified live: the operator passphrase
opens the LUKS header once stdin is forwarded).

Add an on_node_stdin helper (kubectl exec -i, same Running-pod
selection) and route the assert through it. Keep cryptsetup stderr
and print it on the failure path — the old 2>/dev/null swallowed the
'Nothing to read on input' tell and masked the root cause.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…re cells (BUG-039)

The three data-bearing LUKS cells waited for exactly 2 Resource CRDs
after --auto-place=2, but on a 3-worker stand the controller adds
(and flaps) a DISKLESS TIE_BREAKER witness, so the all-CRD count
oscillates 2-3-2 and the equality check times out spuriously with
'did not autoplace 2 replicas'. Count diskful replicas via
linstor_diskful_nodes instead — the convention the sibling
encryption-passphrase-luks-rd and luks-rd-create cells already use.

With the counting fixed, luks-resize-encrypted goes green on a live
stand; luks-clone-encrypted and luks-snapshot-restore-encrypted now
surface the real blocker (cross-node snapshot ship fails in the
clone/restore engine), which is tracked separately as BUG-038.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1f669b1-d084-4cb2-a798-a5269dae78bb

📥 Commits

Reviewing files that changed from the base of the PR and between 7389580 and d8e9d2b.

📒 Files selected for processing (5)
  • tests/e2e/cli-matrix/lib.sh
  • tests/e2e/cli-matrix/luks-clone-encrypted.sh
  • tests/e2e/cli-matrix/luks-resize-encrypted.sh
  • tests/e2e/cli-matrix/luks-snapshot-restore-encrypted.sh
  • tests/e2e/lib.sh

📝 Walkthrough

Walkthrough

This PR improves test reliability by adding stdin forwarding infrastructure and refactoring replica-counting logic across LUKS encryption tests. It introduces on_node_stdin to correctly handle stdin through kubectl exec, updates LUKS passphrase assertions to use it and expose cryptsetup errors, and consolidates diskful-only replica detection across multiple encrypted storage test scenarios.

Changes

Test Infrastructure and LUKS/Diskful Replica Fixes

Layer / File(s) Summary
Infrastructure: stdin-forwarding helper for remote execution
tests/e2e/lib.sh
New on_node_stdin() function locates running satellite pods on a specified node and executes commands with stdin forwarded via kubectl exec -i, enabling passphrase input and other stdin-dependent remote operations.
LUKS passphrase assertion: use stdin helper and capture output
tests/e2e/cli-matrix/lib.sh
assert_luks_passphrase_opens() now uses on_node_stdin to preserve stdin during cryptsetup invocation, captures cryptsetup stderr/stdout instead of suppressing, and prints output on failure for triage.
Diskful replica counting: replace kubectl/awk with linstor_diskful_nodes across encrypted test scripts
tests/e2e/cli-matrix/luks-clone-encrypted.sh, tests/e2e/cli-matrix/luks-resize-encrypted.sh, tests/e2e/cli-matrix/luks-snapshot-restore-encrypted.sh
Replaces three instances of kubectl/awk-based replica node enumeration with linstor_diskful_nodes calls to count only diskful replicas, fixing flakiness when diskless tie-breaker witnesses cause placement counts to oscillate in 3-worker setups.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • cozystack/blockstor#145: Adds new LUKS encryption e2e tests that depend on the on_node_stdin helper and assert_luks_passphrase_opens function being updated in this PR to handle stdin and surface cryptsetup output.

Poem

🐰 A rabbit hops through test-land green,
With stdin streams kept pristine,
No diskless ghosts will fool the way,
Just diskful friends to guide the play!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main fixes: adding stdin forwarding to cryptsetup via on_node_stdin and fixing replica counting to use diskful-only logic, with reference to BUG-039. It directly correlates with the substantial changes across multiple LUKS test scripts.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bug-039-luks-cli-matrix-stdin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses BUG-039 in the end-to-end tests. It introduces a new helper function on_node_stdin to properly forward stdin via kubectl exec -i, resolving an issue where empty passphrases were being read by cryptsetup. Additionally, it updates several test scripts to use linstor_diskful_nodes for counting diskful replicas, preventing spurious timeouts caused by diskless tie-breaker witnesses. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@kvaps Andrei Kvapil (kvaps) merged commit 92030f3 into main Jun 13, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant