Skip to content

fix(e2e): make rwx-ganesha dump_diag actually capture the failure surface#144

Merged
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/rwx-ganesha-dump-diag
Jun 12, 2026
Merged

fix(e2e): make rwx-ganesha dump_diag actually capture the failure surface#144
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/rwx-ganesha-dump-diag

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Hardens the dump_diag helper in the rwx-ganesha e2e scenario so a failing run captures the state needed for triage on the first attempt. Diagnostics only — no timeout or assertion changes.

Why

During triage of the intermittent rwx-ganesha CI failures the existing diagnostics had three blind spots:

  • The blockstor-side dump ran linstor r l via kubectl exec inside the blockstor-controller image, which ships no linstor binary, so that section always failed with executable file not found in $PATH. It now reads the blockstor ResourceDefinition / Resource CRDs instead — same state, no in-pod binary or port-forward needed.
  • kubectl logs ds/linstor-csi-node prints only one pod of the DaemonSet ("Found 3 pods, using pod/..."), hiding the workers that actually hit the failure. The dump now loops over every linstor-csi-node pod, all containers.
  • The NFS-Ganesha publish side was invisible. The dump now includes every linstor-csi-nfs-server pod log (all containers), the drbd-reactor promoter ConfigMap, and the EndpointSlices of the linstor-csi-nfs Service — the pieces that show whether the promoter ever promoted the backing DRBD resource and whether the Service had a ready backend.

For the volume under test the dump also prints the full RD and per-node Resource CRDs, so the DRBD-side placement and state are visible without re-running the scenario.

Summary by CodeRabbit

  • Tests
    • Enhanced diagnostic logging in end-to-end tests for improved troubleshooting capabilities.

…face

The wait-Ready diagnostics had three blind spots that made BUG-028
triage rely on re-running the scenario by hand:

- the blockstor-side dump exec'd `linstor r l` inside the
  blockstor-controller image, which ships no linstor binary, so that
  section always failed with 'executable file not found'. Read the
  RD/Resource CRDs instead — same state, no in-pod binary needed.
- `kubectl logs ds/linstor-csi-node` only prints one pod of the
  DaemonSet; loop over every linstor-csi-node pod, all containers.
- the NFS-Ganesha publish side was invisible: dump every
  linstor-csi-nfs-server pod (all containers), the drbd-reactor
  promoter ConfigMap, and the EndpointSlices of svc linstor-csi-nfs.

Diagnostics only — no timeout or assertion changes.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c7c094bb-c050-4753-9588-716927642f66

📥 Commits

Reviewing files that changed from the base of the PR and between 8f70323 and 7482e5a.

📒 Files selected for processing (1)
  • tests/e2e/rwx-ganesha.sh

📝 Walkthrough

Walkthrough

The test utility dump_diag() expands diagnostic output for RWX Ganesha scenarios: it now collects logs from all linstor-csi-node and linstor-csi-nfs-server pods across all containers, retrieves the reactor ConfigMap and service EndpointSlices, and displays blockstor resourcedefinitions and resources CRDs, with optional filtering by persistent volume.

Changes

RWX test diagnostics

Layer / File(s) Summary
dump_diag() diagnostic function enrichment
tests/e2e/rwx-ganesha.sh
dump_diag() replaces single-pod log tailing and controller exec with explicit per-pod log collection for linstor-csi-node and linstor-csi-nfs-server pods (all containers), and adds kubectl get diagnostics for the reactor ConfigMap, linstor-csi-nfs service EndpointSlices, and blockstor resource CRDs (optionally scoped to $PV).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Logs now flourish where once they hid—
All pods and containers, none forbid!
ConfigMaps, slices, resources too,
The diagnostics flow, shiny and new!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: improving the dump_diag function to better capture diagnostic information during test failures, which aligns with the substantial updates to diagnostic logging and cluster-state visibility.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rwx-ganesha-dump-diag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the diagnostic logging in tests/e2e/rwx-ganesha.sh by explicitly looping over all pods for linstor-csi-node and linstor-csi-nfs-server to fetch logs, dumping the promoter ConfigMap and EndpointSlices, and replacing the failing linstor r l command with blockstor CRD dumps. The review comments correctly identify that the redirection syntax 2>&1 >&2 used across several commands is incorrect and redirects both stdout and stderr to stdout instead of stderr; they suggest using >&2 2>&1 to fix this.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/e2e/rwx-ganesha.sh
Comment on lines +89 to +90
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=80 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=80 2>&1 >&2 || true
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=80 >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
Comment on lines +101 to +102
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=120 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=120 2>&1 >&2 || true
kubectl -n piraeus-datastore logs "$pod" --all-containers \
--prefix --tail=120 >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
Comment on lines +105 to +106
kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \
-o yaml 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \
-o yaml 2>&1 >&2 || true
kubectl -n piraeus-datastore get cm linstor-csi-nfs-server-reactor-config \
-o yaml >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
Comment on lines +108 to +109
kubectl -n piraeus-datastore get endpointslices \
-l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl -n piraeus-datastore get endpointslices \
-l kubernetes.io/service-name=linstor-csi-nfs -o yaml 2>&1 >&2 || true
kubectl -n piraeus-datastore get endpointslices \
-l kubernetes.io/service-name=linstor-csi-nfs -o yaml >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
# found in \$PATH". The blockstor CRDs carry the same RD/Resource
# state without needing any in-pod binary or a port-forward.
echo "----- diag ($label): blockstor RD + Resource CRDs -----" >&2
kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl get resourcedefinitions.blockstor.cozystack.io 2>&1 >&2 || true
kubectl get resourcedefinitions.blockstor.cozystack.io >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
Comment on lines +118 to +119
kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \
-o yaml 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \
-o yaml 2>&1 >&2 || true
kubectl get "resourcedefinitions.blockstor.cozystack.io/$PV" \
-o yaml >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
-o yaml 2>&1 >&2 || true
for res in $(kubectl get resources.blockstor.cozystack.io \
-o name 2>/dev/null | grep -F "$PV" || true); do
kubectl get "$res" -o yaml 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl get "$res" -o yaml 2>&1 >&2 || true
kubectl get "$res" -o yaml >&2 2>&1 || true

Comment thread tests/e2e/rwx-ganesha.sh
kubectl get "$res" -o yaml 2>&1 >&2 || true
done
else
kubectl get resources.blockstor.cozystack.io 2>&1 >&2 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The redirection 2>&1 >&2 redirects both stdout and stderr to stdout instead of stderr. Use >&2 2>&1 to correctly redirect both to stderr.

Suggested change
kubectl get resources.blockstor.cozystack.io 2>&1 >&2 || true
kubectl get resources.blockstor.cozystack.io >&2 2>&1 || true

@kvaps Andrei Kvapil (kvaps) merged commit c54be3b into main Jun 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant