Skip to content

test(e2e): L6/L7 release-gate coverage for rd clone data plane and Secret-only LUKS provisioning#145

Merged
Andrei Kvapil (kvaps) merged 3 commits into
mainfrom
test/clone-luks-l6-l7
Jun 12, 2026
Merged

test(e2e): L6/L7 release-gate coverage for rd clone data plane and Secret-only LUKS provisioning#145
Andrei Kvapil (kvaps) merged 3 commits into
mainfrom
test/clone-luks-l6-l7

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Release-gate L6 cli-matrix cells + L7 replay workflows for two recently-fixed CLI-reachable flows, per the CLI-bug-fix protocol (CLAUDE.md):

1. rd clone data plane (Bug 020, fix merged in #142)

  • L6 tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh — seeds a deterministic marker on a 2-replica source (via resolve_drbd_device), then drives both wire variants: the plain python-CLI resource-definition clone (linstor-client 1.27.1 declares --use-zfs-clone as store_true, default=None, so the bare verb POSTs a body without use_zfs_clone) and the raw-REST POST /v1/resource-definitions/{src}/clone with use_zfs_clone: true — the exact body linstor-csi sends. Asserts per variant: clone accepted (no 400/501), GET .../clone/{dst} reaches COMPLETE, 2 replicas converge UpToDate, the marker bytes are present on every clone replica (promoting each in turn — catches the Bug-114 silently-empty-shell mode), and the internal clone-<dst> snapshot is visible on the source (delta row 82).
  • L7 tests/operator-harness/replay/rd-clone-vd-data-plane.yaml — create → autoplace → all_uptodate on the source → rd clonereplica_count + all_uptodate on the clone RD (await rd: fields go through the same substitution as cmd argv, so asserting on the second RD via {{rd}}-cl works). Teardown deletes the clone, then the internal clone-{{rd}}-cl snapshot, then the source (RD delete is blocked while snapshots exist).

2. Secret-only LUKS provisioning (Bug 023, fix in #143not yet merged)

  • L6 tests/e2e/cli-matrix/encryption-passphrase-luks-rd.shlinstor encryption create-passphrase alone, with the legacy DrbdOptions/EncryptPassphrase controller property asserted absent before and after provisioning, then rd c -l drbd,luks,storage + r c --auto-place=2 to UpToDate, and kernel-level proof on each replica (cryptsetup luksDump + --test-passphrase against the backing device via the existing luks_backing_device / wait_luks_header_present / assert_luks_passphrase_opens helpers). Cleanup via cleanup_encryption_state.
  • L7 tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml — clears the legacy prop (empty-value = delete) with a prop_value absence await, create-passphrase (expect_exit: [0, 10] for the shared stand), LUKS rd create, autoplace, all_uptodate, and a final prop_value absence re-check.

The sibling luks-* cells and replay/luks-encrypted-rd.yaml keep covering the deprecated controller-prop path; these new tests pin that the prop is no longer required.

Merge ordering

This PR must merge after #143. The Bug-023 cell and replay are written against the post-#143 contract: on current main the rd c -l drbd,luks,storage step is rejected with "LUKS layer requires DrbdOptions/EncryptPassphrase to be set first", so both stay red until #143 lands. The branch is based on main and already includes the clone fix (#142), so the Bug-020 pair is green-eligible immediately.

Validation status

Authoring-only at this point — neither stand has run these yet (both stands are currently busy). Stand validation is deferred to the sweep phase, where the cells must be run via make e2e NAME=<cluster> SCENARIO=cli-matrix/... and the replays via tests/operator-harness/replay-runner.sh before the covered bugs count as closed per the protocol.

Local checks done: bash -n + shellcheck on both cells (only the SC1091 info shared with every sibling cell), yaml.safe_load on both replays, and every await.kind/field used (replica_count, all_uptodate, prop_value with obj: controller and omitted-expected absence semantics) verified against the current replay-runner.sh/lib.sh contract.

Known limitation

The replay runner executes linstor argv only — no dd/curl/cryptsetup steps — so the byte-level clone assert and the LUKS-header probes live exclusively in the L6 cells; the replays pin the verb + convergence shape the runner can express.

Summary by CodeRabbit

  • Tests
    • Added comprehensive end-to-end tests for LUKS encryption functionality to validate provisioning workflows
    • Added end-to-end tests for resource definition cloning scenarios validating data-plane replication behavior
    • Added operator integration test workflows for encryption and cloning feature validation
    • Updated test documentation with new test cell descriptions

Andrei Kvapil (kvaps) and others added 2 commits June 12, 2026 11:11
Bug-020 release-gate coverage for the clone fix: the L6 cell drives
both wire variants (plain python-CLI clone body without use_zfs_clone,
and the raw-REST use_zfs_clone=true body linstor-csi sends), then
proves the data plane really copied by reading a deterministic marker
back on every clone replica, alongside clone-status COMPLETE and the
internal clone-<dst> snapshot visibility.

The L7 replay pins the verb + convergence shape the runner can
express (create, autoplace, clone, all_uptodate on the clone RD);
the byte-level assert stays in the L6 cell because replay steps are
linstor-argv only.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Release-gate coverage for the encryption create-passphrase fix: the
L6 cell drives the upstream-parity flow (create-passphrase, then
rd c -l drbd,luks,storage + autoplace) while asserting the legacy
DrbdOptions/EncryptPassphrase controller prop stays absent end to
end, and proves at the kernel level that the Secret-backed passphrase
formatted and opens the LUKS header on each replica.

The L7 replay mirrors the flow with prop_value absence asserts before
and after provisioning; the cryptsetup-level checks stay in the L6
cell because replay steps are linstor-argv only.

Written against the post-fix contract: both tests stay red until the
Bug-023 fix lands and must be gated on it.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new end-to-end and operator-harness replay tests to cover Bug 023 (verifying Secret-only LUKS provisioning without legacy controller properties) and Bug 020 (verifying data-plane integrity for volume-definition-bearing resource-definition clones). The review feedback correctly identifies that process substitutions in the clone test script mask command failures under set -e and recommends using mktemp instead of a hardcoded /tmp file path to prevent conflicts and security risks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

rm -f "$err_file"

echo ">> [Bug 020 / A] clone replicas materialise + converge UpToDate"
mapfile -t cli_nodes < <(wait_clone_replicas "$DST_CLI" 120)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Process substitution <(...) runs asynchronously in a subshell, and its exit status is not monitored or caught by set -e. If wait_clone_replicas fails and returns 1, the script will silently ignore the failure and proceed to wait_uptodate with empty or unbound variables, leading to hard-to-debug test failures.

To ensure that failures in wait_clone_replicas are caught immediately by set -e, assign the output to a variable first and then parse it with mapfile.

Suggested change
mapfile -t cli_nodes < <(wait_clone_replicas "$DST_CLI" 120)
cli_nodes_str=$(wait_clone_replicas "$DST_CLI" 120)
mapfile -t cli_nodes <<< "$cli_nodes_str"

fi

echo ">> [Bug 020 / B] clone replicas materialise + converge UpToDate"
mapfile -t zfs_nodes < <(wait_clone_replicas "$DST_ZFS" 120)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Process substitution <(...) runs asynchronously in a subshell, and its exit status is not monitored or caught by set -e. If wait_clone_replicas fails and returns 1, the script will silently ignore the failure and proceed to wait_uptodate with empty or unbound variables, leading to hard-to-debug test failures.

To ensure that failures in wait_clone_replicas are caught immediately by set -e, assign the output to a variable first and then parse it with mapfile.

Suggested change
mapfile -t zfs_nodes < <(wait_clone_replicas "$DST_ZFS" 120)
zfs_nodes_str=$(wait_clone_replicas "$DST_ZFS" 120)
mapfile -t zfs_nodes <<< "$zfs_nodes_str"

Comment on lines +199 to +209
http_code=$(curl -sS -m 30 -o /tmp/cli-matrix-020-clone.json -w '%{http_code}' \
-X POST -H 'Content-Type: application/json' \
-d "{\"name\":\"${DST_ZFS}\",\"use_zfs_clone\":true}" \
"http://127.0.0.1:${LCTL_PORT}/v1/resource-definitions/${SRC}/clone" \
2>/dev/null || echo "000")
if [[ "$http_code" != "201" ]]; then
echo "FAIL (Bug 020): use_zfs_clone=true POST answered HTTP $http_code, want 201" >&2
echo " (pre-fix: 400 DisallowUnknownFields on use_zfs_clone)" >&2
cat /tmp/cli-matrix-020-clone.json >&2 2>/dev/null || true
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a hardcoded file path in /tmp (like /tmp/cli-matrix-020-clone.json) can lead to conflicts, permission issues, or security vulnerabilities (such as symlink attacks) if multiple tests run concurrently or if leftovers exist from previous runs.

Instead, use a secure temporary file created via mktemp and ensure it is cleaned up properly.

Suggested change
http_code=$(curl -sS -m 30 -o /tmp/cli-matrix-020-clone.json -w '%{http_code}' \
-X POST -H 'Content-Type: application/json' \
-d "{\"name\":\"${DST_ZFS}\",\"use_zfs_clone\":true}" \
"http://127.0.0.1:${LCTL_PORT}/v1/resource-definitions/${SRC}/clone" \
2>/dev/null || echo "000")
if [[ "$http_code" != "201" ]]; then
echo "FAIL (Bug 020): use_zfs_clone=true POST answered HTTP $http_code, want 201" >&2
echo " (pre-fix: 400 DisallowUnknownFields on use_zfs_clone)" >&2
cat /tmp/cli-matrix-020-clone.json >&2 2>/dev/null || true
exit 1
fi
clone_json=$(mktemp)
http_code=$(curl -sS -m 30 -o "$clone_json" -w '%{http_code}' \
-X POST -H 'Content-Type: application/json' \
-d "{\"name\":\"${DST_ZFS}\",\"use_zfs_clone\":true}" \
"http://127.0.0.1:${LCTL_PORT}/v1/resource-definitions/${SRC}/clone" \
2>/dev/null || echo "000")
if [[ "$http_code" != "201" ]]; then
echo "FAIL (Bug 020): use_zfs_clone=true POST answered HTTP $http_code, want 201" >&2
echo " (pre-fix: 400 DisallowUnknownFields on use_zfs_clone)" >&2
cat "$clone_json" >&2 2>/dev/null || true
rm -f "$clone_json"
exit 1
fi
rm -f "$clone_json"

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds comprehensive e2e test coverage for two independent bug fixes: Bug 023 (LUKS Secret-only encryption provisioning) and Bug 020 (RD clone with VD-bearing source data-plane preservation). Each bug receives both CLI-matrix e2e tests and operator-harness replay scenarios with full validation of the expected behavior and failure-mode fixes.

Changes

Bug 023 – LUKS Secret-only Encryption Verification

Layer / File(s) Summary
Test documentation and setup infrastructure
tests/e2e/cli-matrix/README.md, tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh
Documents the new Bug 023 test cell in the README and establishes shell environment, trap-based cleanup, and test identifiers for the encryption test.
Encryption provisioning and validation
tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh
Implements pre-flight pool validation, defines the legacy property assertion helper, executes encryption create-passphrase, creates the encrypted RD/volume/resource, and waits for replica convergence with property re-validation.
LUKS device unlock and final validation
tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh
For each replica node, resolves the LUKS backing device, waits for the LUKS header, and verifies that the Secret passphrase unlocks the device; prints success confirmation.
Operator-harness replay for Bug 023
tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml
Defines an operator replay scenario that pins the upstream-parity sequence: clears legacy property, creates passphrase, provisions LUKS RD/volume/resource, waits for UpToDate convergence, re-validates property absence, and tears down with no-orphans invariant.

Bug 020 – RD Clone with VD Data-plane Preservation

Layer / File(s) Summary
Test environment and infrastructure
tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
Sets up the script header, shell strictness, KUBECONFIG, linstor CLI initialization, cleanup trap for clone/source deletion order, and global naming/pool configuration.
Source RD provisioning and marker seeding
tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
Creates the source RD with volume definition, provisions two diskful replicas on the selected pool, seeds deterministic marker bytes onto the source DRBD device, and validates UpToDate convergence.
Clone validation and assertion helpers
tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
Defines three polling/validation helpers: wait_clone_replicas polls for clone materialization to exactly two diskful nodes, assert_clone_marker validates marker-byte presence on each replica after promotion, and assert_internal_snapshot checks for the clone's internal snapshot CR.
Variant A – CLI-based clone and validation
tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
Runs linstor CLI clone (without use_zfs_clone), waits for diskful materialization, validates UpToDate convergence and marker-byte preservation on all replicas, and asserts internal snapshot existence.
Variant B – REST-based clone and validation
tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
POSTs to the clone REST endpoint with use_zfs_clone: true, expects HTTP 201, polls clone status until COMPLETE, then performs the same materialization, convergence, marker, and snapshot assertions as Variant A.
Operator-harness replay for Bug 020
tests/operator-harness/replay/rd-clone-vd-data-plane.yaml
Defines an operator replay scenario that creates source RD/VD with two diskful replicas, runs the clone operation, awaits clone materialization and UpToDate convergence, and tears down clone-first-then-source with internal snapshot and no-orphans invariant.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cozystack/blockstor#110: Both PRs update test coverage around LUKS "secret-only" provisioning and the DrbdOptions/EncryptPassphrase controller property; this PR adds new e2e CLI-matrix and operator-harness validation tests, while the retrieved PR modifies related upstream property checks.

Poem

🐰 Two bugs hop into the test warren,
One locks secrets in LUKS's garden,
One clones data without a care,
With marker bytes and snapshots fair—
Now replay and e2e both declare:
The warren's tunnels lead everywhere!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: adding L6/L7 test coverage for resource definition clone data plane verification and Secret-only LUKS provisioning.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/clone-luks-l6-l7

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml (1)

85-89: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

await.kind: prop_value is not in the allowed list per coding guidelines.

Same issue as lines 44-48: prop_value is not in the coding guidelines' allowed list for await.kind, though it is implemented and necessary for this Bug-023 validation (verifying the legacy property remains absent post-provisioning).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml` around
lines 85 - 89, The test uses an unsupported await.kind value "prop_value";
replace it with the allowed kind that checks absence (e.g., "prop_missing" or
the project's canonical "prop_absent") so the step still verifies that
DrbdOptions/EncryptPassphrase is not present after provisioning; update both
occurrences (the block referencing key DrbdOptions/EncryptPassphrase and the
earlier identical block around lines 44-48) to use the permitted await.kind name
so the validation remains equivalent to the original intent.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh`:
- Around line 168-173: Update the comment above the wait_uptodate call to
explain that wait_uptodate (from tests/e2e/lib.sh) primarily polls each
replica’s observer-stamped .status.volumes[].diskState via status_disk_state and
returns success when both are "UpToDate", but it will also succeed if
kernel_pair_uptodate reports ok (to tolerate CRD/status lag), so the helper can
pass via kernel fallback rather than only Resource.Status convergence; reference
wait_uptodate, status_disk_state and kernel_pair_uptodate by name in the
comment.

In `@tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh`:
- Line 189: The mapfile invocation using process substitution drops the exit
status of wait_clone_replicas, so if wait_clone_replicas (called with DST_CLI)
times out the script continues with an empty cli_nodes and later fails
confusingly; fix by capturing the helper's exit code and output instead of using
direct process substitution: run wait_clone_replicas into a temporary variable,
record its exit status, then populate cli_nodes from that output (mapfile -t
cli_nodes <<<"$output") and validate the array length (or exit status)
immediately and emit the clear "never materialised X diskful replicas" error and
exit non‑zero if it failed; apply the same pattern for the other occurrence
(variant B around the second mapfile usage) and adjust references to cli_nodes,
wait_clone_replicas, wait_uptodate and DST_CLI accordingly.

In `@tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml`:
- Around line 44-48: Await.kind "prop_value" is used by replay YAMLs (e.g., the
encryption-passphrase-luks-rd case) but is missing from the allowed await.kind
list in the guidelines; update the guidelines to either add "prop_value" to the
allowed await.kind enumeration or add an explicit exception for the harness's
prop_value behavior (which is implemented by the harness's prop_value handler)
so policy checks accept await.kind: prop_value references such as the one
setting DrbdOptions/EncryptPassphrase.

---

Duplicate comments:
In `@tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml`:
- Around line 85-89: The test uses an unsupported await.kind value "prop_value";
replace it with the allowed kind that checks absence (e.g., "prop_missing" or
the project's canonical "prop_absent") so the step still verifies that
DrbdOptions/EncryptPassphrase is not present after provisioning; update both
occurrences (the block referencing key DrbdOptions/EncryptPassphrase and the
earlier identical block around lines 44-48) to use the permitted await.kind name
so the validation remains equivalent to the original intent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5d7c2a6-40c0-427c-9a57-851a88135fad

📥 Commits

Reviewing files that changed from the base of the PR and between c54be3b and 8165bc2.

📒 Files selected for processing (5)
  • tests/e2e/cli-matrix/README.md
  • tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh
  • tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
  • tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml
  • tests/operator-harness/replay/rd-clone-vd-data-plane.yaml

Comment on lines +168 to +173
echo ">> [Bug 023] wait both replicas UpToDate (Secret-fed luksFormat ran)"
# Pre-fix failure mode for a gate-only patch: rd-create passes but the
# satellite loops on "LUKS in layer stack but Props.LuksPassphrase
# empty" and the replicas never converge. UpToDate within the bound is
# the proof the Secret reached the satellite channel.
wait_uptodate "$RD" "$N1" "$N2"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify wait_uptodate exists and uses observer-stamped Status

# Find wait_uptodate definition
ast-grep --pattern $'wait_uptodate() {
  $$$
}'

# Also check if it calls wait_status_state or similar Status-checking functions
rg -nP -A 10 '^\s*(function\s+)?wait_uptodate\s*\(' tests/e2e/cli-matrix/lib.sh tests/e2e/lib.sh

Repository: cozystack/blockstor

Length of output: 2837


Update the comment: wait_uptodate checks .status.diskState, but it can also succeed via kernel fallback.

wait_uptodate (tests/e2e/lib.sh) polls both replicas’ observer-stamped .status.volumes[].diskState (via status_disk_state) and returns success only when both are "UpToDate". If that isn’t true yet, it may still return success when kernel_pair_uptodate ... reports ok (to tolerate CRD/status lag), so the helper isn’t purely a “Resource.Status convergence only” assertion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/encryption-passphrase-luks-rd.sh` around lines 168 -
173, Update the comment above the wait_uptodate call to explain that
wait_uptodate (from tests/e2e/lib.sh) primarily polls each replica’s
observer-stamped .status.volumes[].diskState via status_disk_state and returns
success when both are "UpToDate", but it will also succeed if
kernel_pair_uptodate reports ok (to tolerate CRD/status lag), so the helper can
pass via kernel fallback rather than only Resource.Status convergence; reference
wait_uptodate, status_disk_state and kernel_pair_uptodate by name in the
comment.

rm -f "$err_file"

echo ">> [Bug 020 / A] clone replicas materialise + converge UpToDate"
mapfile -t cli_nodes < <(wait_clone_replicas "$DST_CLI" 120)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Process substitution exit code is lost; test continues with empty array on timeout.

mapfile -t cli_nodes < <(wait_clone_replicas ...) runs the helper in a subshell whose exit status is not captured by set -e. If wait_clone_replicas times out and returns 1, the script continues with an empty cli_nodes array, then fails at wait_uptodate with empty node arguments—producing a confusing error instead of the clear "never materialised 2 diskful replicas" message.

Proposed fix: validate array length after mapfile
 mapfile -t cli_nodes < <(wait_clone_replicas "$DST_CLI" 120)
+if (( ${`#cli_nodes`[@]} != 2 )); then
+    echo "FAIL (Bug 020 / A): wait_clone_replicas returned ${`#cli_nodes`[@]} nodes, expected 2" >&2
+    exit 1
+fi
 wait_uptodate "$DST_CLI" "${cli_nodes[0]}" "${cli_nodes[1]}"

Apply the same pattern after line 229 for variant B.

Also applies to: 229-229

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/rd-clone-vd-data-plane.sh` at line 189, The mapfile
invocation using process substitution drops the exit status of
wait_clone_replicas, so if wait_clone_replicas (called with DST_CLI) times out
the script continues with an empty cli_nodes and later fails confusingly; fix by
capturing the helper's exit code and output instead of using direct process
substitution: run wait_clone_replicas into a temporary variable, record its exit
status, then populate cli_nodes from that output (mapfile -t cli_nodes
<<<"$output") and validate the array length (or exit status) immediately and
emit the clear "never materialised X diskful replicas" error and exit non‑zero
if it failed; apply the same pattern for the other occurrence (variant B around
the second mapfile usage) and adjust references to cli_nodes,
wait_clone_replicas, wait_uptodate and DST_CLI accordingly.

Comment on lines +44 to +48
await:
kind: prop_value
obj: controller
key: DrbdOptions/EncryptPassphrase
timeout_s: 30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that prop_value is implemented in lib.sh and check if other replay workflows use it

echo "=== Checking lib.sh for prop_value implementation ==="
rg -A 20 'prop_value\)' tests/operator-harness/lib.sh

echo ""
echo "=== Checking existing replay workflows that use prop_value ==="
rg -l 'kind: prop_value' tests/operator-harness/replay/

Repository: cozystack/blockstor

Length of output: 2022


Update await.kind guidelines to include prop_value (or document an explicit exception)

tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml uses await.kind: prop_value (lines 44-48), but the coding guidelines’ allowed await.kind list doesn’t include prop_value. This isn’t a new/unsupported pattern: prop_value is implemented in tests/operator-harness/lib.sh and already used by other replay YAMLs. Update the guidelines to include prop_value (or add an explicit carve-out) to avoid false policy violations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/encryption-passphrase-luks-rd.yaml` around
lines 44 - 48, Await.kind "prop_value" is used by replay YAMLs (e.g., the
encryption-passphrase-luks-rd case) but is missing from the allowed await.kind
list in the guidelines; update the guidelines to either add "prop_value" to the
allowed await.kind enumeration or add an explicit exception for the harness's
prop_value behavior (which is implemented by the harness's prop_value handler)
so policy checks accept await.kind: prop_value references such as the one
setting DrbdOptions/EncryptPassphrase.

Source: Coding guidelines

@kvaps Andrei Kvapil (kvaps) merged commit 210101e into main Jun 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant