fix(e2e): make rwx-ganesha dump_diag actually capture the failure surface by kvaps · Pull Request #144 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-12T04:14:38Z

What

Hardens the dump_diag helper in the rwx-ganesha e2e scenario so a failing run captures the state needed for triage on the first attempt. Diagnostics only — no timeout or assertion changes.

Why

During triage of the intermittent rwx-ganesha CI failures the existing diagnostics had three blind spots:

The blockstor-side dump ran linstor r l via kubectl exec inside the blockstor-controller image, which ships no linstor binary, so that section always failed with executable file not found in $PATH. It now reads the blockstor ResourceDefinition / Resource CRDs instead — same state, no in-pod binary or port-forward needed.
kubectl logs ds/linstor-csi-node prints only one pod of the DaemonSet ("Found 3 pods, using pod/..."), hiding the workers that actually hit the failure. The dump now loops over every linstor-csi-node pod, all containers.
The NFS-Ganesha publish side was invisible. The dump now includes every linstor-csi-nfs-server pod log (all containers), the drbd-reactor promoter ConfigMap, and the EndpointSlices of the linstor-csi-nfs Service — the pieces that show whether the promoter ever promoted the backing DRBD resource and whether the Service had a ready backend.

For the volume under test the dump also prints the full RD and per-node Resource CRDs, so the DRBD-side placement and state are visible without re-running the scenario.

Summary by CodeRabbit

Tests
- Enhanced diagnostic logging in end-to-end tests for improved troubleshooting capabilities.

…face The wait-Ready diagnostics had three blind spots that made BUG-028 triage rely on re-running the scenario by hand: - the blockstor-side dump exec'd `linstor r l` inside the blockstor-controller image, which ships no linstor binary, so that section always failed with 'executable file not found'. Read the RD/Resource CRDs instead — same state, no in-pod binary needed. - `kubectl logs ds/linstor-csi-node` only prints one pod of the DaemonSet; loop over every linstor-csi-node pod, all containers. - the NFS-Ganesha publish side was invisible: dump every linstor-csi-nfs-server pod (all containers), the drbd-reactor promoter ConfigMap, and the EndpointSlices of svc linstor-csi-nfs. Diagnostics only — no timeout or assertion changes. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-12T04:14:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c7c094bb-c050-4753-9588-716927642f66

📥 Commits

Reviewing files that changed from the base of the PR and between 8f70323 and 7482e5a.

📒 Files selected for processing (1)

tests/e2e/rwx-ganesha.sh

📝 Walkthrough

Walkthrough

The test utility dump_diag() expands diagnostic output for RWX Ganesha scenarios: it now collects logs from all linstor-csi-node and linstor-csi-nfs-server pods across all containers, retrieves the reactor ConfigMap and service EndpointSlices, and displays blockstor resourcedefinitions and resources CRDs, with optional filtering by persistent volume.

Changes

RWX test diagnostics

Layer / File(s)	Summary
dump_diag() diagnostic function enrichment `tests/e2e/rwx-ganesha.sh`	`dump_diag()` replaces single-pod log tailing and controller exec with explicit per-pod log collection for `linstor-csi-node` and `linstor-csi-nfs-server` pods (all containers), and adds `kubectl get` diagnostics for the reactor ConfigMap, `linstor-csi-nfs` service EndpointSlices, and blockstor resource CRDs (optionally scoped to `$PV`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Logs now flourish where once they hid—
All pods and containers, none forbid!
ConfigMaps, slices, resources too,
The diagnostics flow, shiny and new!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: improving the dump_diag function to better capture diagnostic information during test failures, which aligns with the substantial updates to diagnostic logging and cluster-state visibility.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/rwx-ganesha-dump-diag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request enhances the diagnostic logging in tests/e2e/rwx-ganesha.sh by explicitly looping over all pods for linstor-csi-node and linstor-csi-nfs-server to fetch logs, dumping the promoter ConfigMap and EndpointSlices, and replacing the failing linstor r l command with blockstor CRD dumps. The review comments correctly identify that the redirection syntax 2>&1 >&2 used across several commands is incorrect and redirects both stdout and stderr to stdout instead of stderr; they suggest using >&2 2>&1 to fix this.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T04:16:15Z

+        kubectl -n piraeus-datastore logs "$pod" --all-containers \
+            --prefix --tail=80 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl -n piraeus-datastore logs "$pod" --all-containers \

--prefix --tail=80 2>&1 >&2 || true

kubectl -n piraeus-datastore logs "$pod" --all-containers \

--prefix --tail=80 >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:15Z

+        kubectl -n piraeus-datastore logs "$pod" --all-containers \
+            --prefix --tail=120 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl -n piraeus-datastore logs "$pod" --all-containers \

--prefix --tail=120 2>&1 >&2 || true

kubectl -n piraeus-datastore logs "$pod" --all-containers \

--prefix --tail=120 >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:15Z

+    kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \
+        -o yaml 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \

-o yaml 2>&1 >&2 || true

kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \

-o yaml >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:15Z

+    kubectl -n piraeus-datastore get endpointslices \
+        -l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl -n piraeus-datastore get endpointslices \

-l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true

kubectl -n piraeus-datastore get endpointslices \

-l kubernetes.io/service-name=linstor-csi-nfs -o yaml >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:15Z

+    # found in \$PATH". The blockstor CRDs carry the same RD/Resource
+    # state without needing any in-pod binary or a port-forward.
+    echo "----- diag ($label): blockstor RD + Resource CRDs -----" >&2
+    kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 || true

kubectl get resourcedefinitions.blockstor.cozystack.io >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:15Z

+        kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \
+            -o yaml 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \

-o yaml 2>&1 >&2 || true

kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \

-o yaml >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:16Z

+            -o yaml 2>&1 >&2 || true
+        for res in $(kubectl get resources.blockstor.cozystack.io \
+                -o name 2>/dev/null | grep -F "$PV" || true); do
+            kubectl get "$res" -o yaml 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl get "$res" -o yaml 2>&1 >&2 || true

kubectl get "$res" -o yaml >&2 2>&1 || true

gemini-code-assist · 2026-06-12T04:16:16Z

+            kubectl get "$res" -o yaml 2>&1 >&2 || true
+        done
+    else
+        kubectl get resources.blockstor.cozystack.io 2>&1 >&2 || true


The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change

kubectl get resources.blockstor.cozystack.io 2>&1 >&2 || true

kubectl get resources.blockstor.cozystack.io >&2 2>&1 || true

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

Andrei Kvapil (kvaps) merged commit c54be3b into main Jun 12, 2026
15 checks passed

		kubectl -n piraeus-datastore logs "$pod" --all-containers \
		--prefix --tail=80 2>&1 >&2 \|\| true

		kubectl -n piraeus-datastore logs "$pod" --all-containers \
		--prefix --tail=120 2>&1 >&2 \|\| true

		kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \
		-o yaml 2>&1 >&2 \|\| true

		kubectl -n piraeus-datastore get endpointslices \
		-l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 \|\| true

	kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 \|\| true
	kubectl get resourcedefinitions.blockstor.cozystack.io >&2 2>&1 \|\| true

		kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \
		-o yaml 2>&1 >&2 \|\| true

	kubectl get "$res" -o yaml 2>&1 >&2 \|\| true
	kubectl get "$res" -o yaml >&2 2>&1 \|\| true

	kubectl get resources.blockstor.cozystack.io 2>&1 >&2 \|\| true
	kubectl get resources.blockstor.cozystack.io >&2 2>&1 \|\| true

Conversation

Andrei Kvapil (kvaps) commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading