fix(e2e): make rwx-ganesha dump_diag actually capture the failure surface#144
Conversation
…face The wait-Ready diagnostics had three blind spots that made BUG-028 triage rely on re-running the scenario by hand: - the blockstor-side dump exec'd `linstor r l` inside the blockstor-controller image, which ships no linstor binary, so that section always failed with 'executable file not found'. Read the RD/Resource CRDs instead — same state, no in-pod binary needed. - `kubectl logs ds/linstor-csi-node` only prints one pod of the DaemonSet; loop over every linstor-csi-node pod, all containers. - the NFS-Ganesha publish side was invisible: dump every linstor-csi-nfs-server pod (all containers), the drbd-reactor promoter ConfigMap, and the EndpointSlices of svc linstor-csi-nfs. Diagnostics only — no timeout or assertion changes. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe test utility ChangesRWX test diagnostics
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request enhances the diagnostic logging in tests/e2e/rwx-ganesha.sh by explicitly looping over all pods for linstor-csi-node and linstor-csi-nfs-server to fetch logs, dumping the promoter ConfigMap and EndpointSlices, and replacing the failing linstor r l command with blockstor CRD dumps. The review comments correctly identify that the redirection syntax 2>&1 >&2 used across several commands is incorrect and redirects both stdout and stderr to stdout instead of stderr; they suggest using >&2 2>&1 to fix this.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | ||
| --prefix --tail=80 2>&1 >&2 || true |
There was a problem hiding this comment.
The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | |
| --prefix --tail=80 2>&1 >&2 || true | |
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | |
| --prefix --tail=80 >&2 2>&1 || true |
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | ||
| --prefix --tail=120 2>&1 >&2 || true |
There was a problem hiding this comment.
The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | |
| --prefix --tail=120 2>&1 >&2 || true | |
| kubectl -n piraeus-datastore logs "$pod" --all-containers \ | |
| --prefix --tail=120 >&2 2>&1 || true |
| kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \ | ||
| -o yaml 2>&1 >&2 || true |
There was a problem hiding this comment.
The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.
| kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \ | |
| -o yaml 2>&1 >&2 || true | |
| kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \ | |
| -o yaml >&2 2>&1 || true |
| kubectl -n piraeus-datastore get endpointslices \ | ||
| -l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true |
There was a problem hiding this comment.
The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.
| kubectl -n piraeus-datastore get endpointslices \ | |
| -l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true | |
| kubectl -n piraeus-datastore get endpointslices \ | |
| -l kubernetes.io/service-name=linstor-csi-nfs -o yaml >&2 2>&1 || true |
| # found in \$PATH". The blockstor CRDs carry the same RD/Resource | ||
| # state without needing any in-pod binary or a port-forward. | ||
| echo "----- diag ($label): blockstor RD + Resource CRDs -----" >&2 | ||
| kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 || true |
There was a problem hiding this comment.
| kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \ | ||
| -o yaml 2>&1 >&2 || true |
There was a problem hiding this comment.
The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.
| kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \ | |
| -o yaml 2>&1 >&2 || true | |
| kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \ | |
| -o yaml >&2 2>&1 || true |
| -o yaml 2>&1 >&2 || true | ||
| for res in $(kubectl get resources.blockstor.cozystack.io \ | ||
| -o name 2>/dev/null | grep -F "$PV" || true); do | ||
| kubectl get "$res" -o yaml 2>&1 >&2 || true |
| kubectl get "$res" -o yaml 2>&1 >&2 || true | ||
| done | ||
| else | ||
| kubectl get resources.blockstor.cozystack.io 2>&1 >&2 || true |
There was a problem hiding this comment.
What
Hardens the
dump_diaghelper in therwx-ganeshae2e scenario so a failing run captures the state needed for triage on the first attempt. Diagnostics only — no timeout or assertion changes.Why
During triage of the intermittent rwx-ganesha CI failures the existing diagnostics had three blind spots:
linstor r lviakubectl execinside the blockstor-controller image, which ships nolinstorbinary, so that section always failed withexecutable file not found in $PATH. It now reads the blockstor ResourceDefinition / Resource CRDs instead — same state, no in-pod binary or port-forward needed.kubectl logs ds/linstor-csi-nodeprints only one pod of the DaemonSet ("Found 3 pods, using pod/..."), hiding the workers that actually hit the failure. The dump now loops over every linstor-csi-node pod, all containers.linstor-csi-nfsService — the pieces that show whether the promoter ever promoted the backing DRBD resource and whether the Service had a ready backend.For the volume under test the dump also prints the full RD and per-node Resource CRDs, so the DRBD-side placement and state are visible without re-running the scenario.
Summary by CodeRabbit