test(harness): fix release-gate replay/cli-matrix triage failures (tiebreaker, quorum, mid-sync, over-commit, ps-cdp) by kvaps · Pull Request #156 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-13T08:05:10Z

Harness/docs-only fixes for 7 of the 8 replay/cli-matrix failures surfaced by the release-gate stand sweep at the candidate SHA. No product code changed. (The 8th, an inactive-return-backfills-redundancy failure, is a genuine product race tracked separately — see below.)

Verdicts

Test	Verdict	Cause
n-rst-tiebreaker-machinery-alive	harness	witness `r d` stamps an intentional 5-min suppression window; assert waited only 120s → guaranteed timeout
r-c-with-tiebreaker-peer	harness	a manual `--diskless` node is never flagged `TieBreaker`, and at 1 diskful no auto-witness is managed → `tiebreaker_present` could never match
r-d-sub-quorum-teardown-converges	harness	after `r d node1` (3→2 diskful) the auto-tiebreaker legally relocates onto the freed node1 → `resource_absent node1` unsatisfiable
r-d-last-uptodate-midsync-rejected	legit-skip	FILE_THIN skip-syncs the empty replica → the mid-sync window is unobservable in CRD status; SKIP cleanly
ps-cdp-zfs	legit-skip	`/dev/loop9` exists as a device node but is not a free PhysicalDevice; the free-device gate was too weak
vd-resize-single-replica-no-restart	harness flake	passes 2× in isolation; transient empty-stderr exit-10 under sweep load
rg-c-overcommit-spawn-fails	harness	cell pinned the upstream short-spawn contract; BS intentionally does deferred best-effort autoplace (pinned in spawn_test.go) — renamed to `…-spawn-defers`, known-delta row 83

Tests 1/2/3 share #154's lesson (the auto-witness legally relocates onto a freed/sole spare node; the harness must not wait for a phantom witness or resource_absent there) but live in separate replay YAMLs #154 didn't touch.

Changes

Harness: node-scoped free-device gate via REST /v1/physical-storage; new skip_if_reached modifier for clean SKIP when a scenario is not reproducible on a fast stand.
Replay: assertion fixes in 4 YAMLs — no weakening; each new assert is a true observable condition.
cli-matrix: …-spawn-fails → …-spawn-defers pinning the real BS deferred-autoplace contract; known-delta row 83.

Validated on the stand 3× each (PASS/SKIP as classified).

Product bug found, tracked separately (NOT in this PR)

inactive-return-backfills-redundancy is a genuine TOCTOU race: the witness reaper removeWitnesses does a non-atomic Get→Delete and can clobber a replica the placer just promoted, losing the redundancy backfill. Medium-risk fix on the tiebreaker+autoplace surface; needs its own PR with L6/L7 validation.

Summary by CodeRabbit

New Features
- Resource group spawn-resources now uses deferred best-effort placement when over-committed, placing available replicas and returning actionable feedback instead of failing.
Tests
- Updated end-to-end tests to validate new over-committed spawn behavior.
- Enhanced test framework with conditional skip mechanism for non-applicable scenarios.
Documentation
- Added CLI parity documentation for accepted over-committed resource group behavior.

Two operator-replay harness robustness fixes surfaced by the release-gate sweep on stand dev: - fixture_device_on_any_worker probed only `test -b <dev>` on ANY worker. The kernel pre-creates loop device NODES whether or not anything is attached, so the bare loop node passed the gate on every stand even when nothing was attached -- and ps-cdp-zfs then FAILed (exit 10, "no free PhysicalDevice") instead of SKIPping. The probe is now node-scoped (the workflow pins the create to node1) and consults the controller's authoritative free physical-device view (REST GET /v1/physical-storage), so an absent / in-use device yields a clean SKIP. - Add an optional `skip_if_reached: <disk_state>` await modifier: SKIP the whole workflow cleanly when the target reaches that state before the awaited one. Lets a scenario that needs a transient state a fast stand can race past (e.g. a mid-sync window on a skip-sync pool) SKIP as not-exercisable instead of timing out to a false FAIL. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Four release-gate replay workflows asserted conditions that the candidate's correct, deliberate behaviour can never satisfy on stand dev. All four are harness bugs (no assertion weakening; each new assertion is a true, observable condition): - r-c-with-tiebreaker-peer: after dropping the diskful peer the shape is 1 diskful + 1 user-diskless. The hand-created node3 carries DISKLESS only (not TIE_BREAKER), and at 1 diskful no auto-witness is managed (shouldTieBreakerExist needs >=2 diskful) -- so the `tiebreaker_present` grep for "TieBreaker" could never match. Assert the observable truth (node3 stays Diskless); the Bug-327 contract (bare `r c` comes up DISKFUL) is unchanged. - r-d-sub-quorum-teardown-converges: after `r d node1` on a 3-diskful RD the shape drops to even 2-diskful, re-arming the auto-tiebreaker, which re-occupies the just-freed node1 with a witness -- so `resource_absent node1` could never hold. Assert node1's diskful backing is gone (Diskless), then the second delete reaps the witness too (both nodes absent), the survivor stays UpToDate, and the RD deletes cleanly -- the real U197 sub-quorum-converges contract. - n-rst-tiebreaker-machinery-alive: `r d` of the witness stamps the DELIBERATE 5-minute auto-tiebreaker SUPPRESSION window, so the witness cannot re-add within the old 120s await. Bump to 330s (5min window + margin) -- bounded on the known product constant autoTiebreakerSuppressionWindow, not blind inflation. - r-d-last-uptodate-midsync-rejected: the `stand` FILE_THIN pool skip-syncs a fresh replica (day0 skip-initial-sync), so node2 reaches UpToDate in <10s without ever exposing a CRD-observable mid-sync state -- the U130 rejection window is not exercisable as a pure-CLI replay here (the contract stays pinned at L1 + the L6 cell). Add `skip_if_reached: UpToDate` so it SKIPs cleanly instead of timing out. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

The rg-c-overcommit-spawn-fails cell asserted the UPSTREAM contract (spawn-resources fails short with FAIL_NOT_ENOUGH_NODES / 996 and places nothing) on an over-committed resource group. blockstor deliberately diverges: spawn takes a best-effort DEFERRED autoplace path -- it spawns the RD, places as many diskful replicas as the topology allows (3 of 7 on a 3-node cluster), and surfaces the shortfall as an INFO in the SUCCESS envelope (exit 0). This is pinned at L1 (TestSpawnImpossiblePlacementReturnsActionableError) and backed by the RGRebalanceReconciler additive backfill, so the cell was asserting a contract the product intentionally does not implement. Rename the cell to rg-c-overcommit-spawn-defers and rewrite it to pin the actual contract: create accepts pc=7, spawn succeeds with the deferred-autoplace envelope, RD survives, best-effort places 1..nodes diskful (never over- or under-places). Document the divergence as known-delta row 83. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-13T08:05:26Z

📝 Walkthrough

Walkthrough

This PR extends the operator test harness with conditional workflow skipping to handle variable hardware configurations, updates multiple replay test scenarios with corrected resource state expectations, and adds a new e2e CLI test validating deferred best-effort spawn behavior on over-committed resource groups.

Changes

Over-committed RG spawn deferred behavior

Layer / File(s)	Summary
Over-committed spawn deferred behavior test `tests/e2e/cli-matrix/rg-c-overcommit-spawn-defers.sh`	New e2e test that creates an over-committed RG (place_count=7 on 3-node cluster), verifies spawn-resources exits successfully with "placed N of M" envelope, confirms RD persists for CSI retry, and validates diskful replica counts are within bounds.
CLI parity whitelist entry for deferred spawn `docs/cli-parity-known-deltas.md`	Adds delta `#83` documenting that blockstor spawn-resources uses best-effort autoplace on over-committed RGs, with reconciler top-up behavior when nodes become available.

Test harness skip-if-reached conditional gating

Layer / File(s)	Summary
Fixture and await_assertion skip_if_reached support `tests/operator-harness/lib.sh`	Adds `fixture_free_physical_device_on_node` for node-targeted free-device REST checks, updates `fixture_device_on_any_worker` to use it, extends `await_assertion` setup and polling loop to handle optional `skip_if_reached` disk states, returning sentinel code 2 when matched.
run_step skip sentinel propagation `tests/operator-harness/lib.sh`	Updates `run_step` to capture `await_assertion` return code, immediately propagating sentinel 2 for scenario skipping while treating other non-zero codes as failures.
Replay runner skip_if_reached documentation and control-flow `tests/operator-harness/replay-runner.sh`	Documents `skip_if_reached` await modifier, implements early step-loop exit when `run_step` returns 2, and adds post-teardown skip handling (prints SKIP, exits 0, bypasses invariants).

Operator replay test scenario updates

Layer / File(s)	Summary
Tiebreaker auto-readd timeout and suppression-window docs `tests/operator-harness/replay/n-rst-tiebreaker-machinery-alive.yaml`	Documents auto-tiebreaker suppression-window timing (5 minutes), increases witness-auto-readded timeout from 120s to 330s to allow suppression window to lapse.
Tiebreaker peer diskless state verification `tests/operator-harness/replay/r-c-with-tiebreaker-peer.yaml`	Changes post-delete assertion from `tiebreaker_present` to `replica_diskless` for node3, clarifying diskless peer classification and auto-tiebreaker management behavior.
Mid-sync assertion skip guard for UpToDate states `tests/operator-harness/replay/r-d-last-uptodate-midsync-rejected.yaml`	Adds `skip_if_reached: UpToDate` guard to node2 await step so U130 rejection probe skips when UpToDate is reached before mid-sync SyncTarget is observable.
Sub-quorum teardown witness lifecycle and replica verification `tests/operator-harness/replay/r-d-sub-quorum-teardown-converges.yaml`	Asserts node1 diskless-transition on first delete, adds explicit witness-reap step for node1 resource_absent, changes surviving-replica check from count to node3 disk_state=UpToDate validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cozystack/blockstor#110: Modifies operator test harness control-flow in tests/operator-harness/lib.sh and tests/operator-harness/replay-runner.sh, with changes to run_step and step execution patterns.

Poem

🐰 Over-commit flows now defer with grace,
No fail-fast here in this test-harness place,
Skip-if-reached gates let scenarios breathe,
When hardware won't play, they cleanly retreat,
Replicas placed where they can be—a feat! 🌾

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the primary changes: fixing release-gate test harness failures across replay and cli-matrix tests, with specific categories listed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/release-gate-replay-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the test harness and replay tests to accommodate blockstor's deliberate behavioral differences from upstream LINSTOR, such as deferred autoplacement on over-committed resource groups and auto-tiebreaker suppression windows. It introduces a skip_if_reached mechanism to cleanly skip test scenarios that are not exercisable on fast-syncing stands. The reviewer's feedback identifies two robustness improvements in tests/operator-harness/lib.sh: adding type-safety checks to an embedded Python script to prevent potential AttributeError crashes, and ensuring skip_rd and skip_node are non-empty before executing kubectl commands to avoid invalid resource name errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-13T08:06:34Z

+for g in groups if isinstance(groups, list) else []:
+    for d in (g.get('nodes',{}) or {}).get(node,[]) or []:
+        if d.get('device')==dev:
+            sys.exit(0)


The embedded Python script parses the REST API response for physical storage. However, if any element in the list is not a dictionary (e.g., due to an unexpected API response format), calling .get() or accessing .get('device') will raise an AttributeError and crash the script. Adding type checks ensures the script is robust against unexpected JSON structures.

Suggested change

for g in groups if isinstance(groups, list) else []:

for d in (g.get('nodes',{}) or {}).get(node,[]) or []:

if d.get('device')==dev:

sys.exit(0)

for g in groups if isinstance(groups, list) else []:

if not isinstance(g, dict): continue

nodes = g.get('nodes', {}) or {}

if not isinstance(nodes, dict): continue

for d in nodes.get(node, []) or []:

if not isinstance(d, dict): continue

if d.get('device')==dev:

sys.exit(0)

gemini-code-assist · 2026-06-13T08:06:34Z

+            if [[ -n "$skip_state" ]]; then
+                local cur_state
+                cur_state=$(kubectl get resource "${skip_rd}.${skip_node}" -o jsonpath='{.status.volumes[0].diskState}' 2>/dev/null || echo "")
+                if [[ "$cur_state" == "$skip_state" ]]; then
+                    echo "    ASSERTION SKIP: kind=$kind reached skip_if_reached=$skip_state on $skip_node before the awaited state — scenario not exercisable on this stand (skip-sync pool)" >&2
+                    return 2
+                fi
+            fi


If skip_state is configured but node or rd is missing or empty in the await spec, running kubectl get resource "." will result in an invalid resource name error. Adding a check to ensure both skip_rd and skip_node are non-empty before executing the kubectl command prevents potential test harness failures.

Suggested change

if [[ -n "$skip_state" ]]; then

local cur_state

cur_state=$(kubectl get resource "${skip_rd}.${skip_node}" -o jsonpath='{.status.volumes[0].diskState}' 2>/dev/null || echo "")

if [[ "$cur_state" == "$skip_state" ]]; then

echo " ASSERTION SKIP: kind=$kind reached skip_if_reached=$skip_state on $skip_node before the awaited state — scenario not exercisable on this stand (skip-sync pool)" >&2

return 2

fi

fi

if [[ -n "$skip_state" && -n "$skip_rd" && -n "$skip_node" ]]; then

local cur_state

cur_state=$(kubectl get resource "${skip_rd}.${skip_node}" -o jsonpath='{.status.volumes[0].diskState}' 2>/dev/null || echo "")

if [[ "$cur_state" == "$skip_state" ]]; then

echo " ASSERTION SKIP: kind=$kind reached skip_if_reached=$skip_state on $skip_node before the awaited state — scenario not exercisable on this stand (skip-sync pool)" >&2

return 2

fi

fi

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/cli-matrix/rg-c-overcommit-spawn-defers.sh`:
- Around line 132-156: The script hardcodes nodes=3 which breaks on clusters
with >3 workers; replace the static assignment with a dynamic node count and use
that for the upper-bound check. Specifically, change the variable nodes
(currently set as nodes=3) to compute the actual number of cluster worker nodes
(for example nodes="$(kubectl get nodes --no-headers | wc -l)" or another
cluster-appropriate kubectl query), keeping the rest of the logic (diskful
counting and the comparisons against nodes) unchanged so the test no longer
falsely fails on 4+ node clusters.

In `@tests/operator-harness/replay-runner.sh`:
- Around line 97-109: The documentation for the await modifier `skip_if_reached`
is inconsistent: it states that `skip_if_reached` performs a clean SKIP (exit 0)
but elsewhere the contract text still claims exit 0 means a full PASS; update
the contract text so exit code semantics are unified—describe that exit 0 may
indicate either PASS or a clean SKIP caused by `skip_if_reached`, and clearly
state how automation should distinguish SKIP vs PASS (e.g., a separate SKIP
marker/logging or an explicit exit code convention if you prefer a distinct
code); make the change in the same doc block referencing `skip_if_reached` and
the await-modifier contract so both places use the same wording about exit 0
meaning "PASS or clean SKIP" or switch to a distinct non-zero/zero convention
consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e3ad797-b5b4-4416-b94a-653580728676

📥 Commits

Reviewing files that changed from the base of the PR and between 7389580 and 6efb1e1.

📒 Files selected for processing (9)

docs/cli-parity-known-deltas.md
tests/e2e/cli-matrix/rg-c-overcommit-spawn-defers.sh
tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh
tests/operator-harness/lib.sh
tests/operator-harness/replay-runner.sh
tests/operator-harness/replay/n-rst-tiebreaker-machinery-alive.yaml
tests/operator-harness/replay/r-c-with-tiebreaker-peer.yaml
tests/operator-harness/replay/r-d-last-uptodate-midsync-rejected.yaml
tests/operator-harness/replay/r-d-sub-quorum-teardown-converges.yaml

💤 Files with no reviewable changes (1)

tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh

coderabbitai · 2026-06-13T08:11:04Z

+# Best-effort placement: at least one diskful, never more than the
+# topology allows (3 nodes). Count diskful (non-DISKLESS / non-TIE_BREAKER)
+# Resource CRDs for the RD, polling briefly so the placer has landed.
+nodes=3
+diskful=0
+for _ in $(seq 1 15); do
+    diskful=$(kubectl get resources.blockstor.cozystack.io -o json 2>/dev/null \
+        | RD="$RD" python3 -c "import json,sys,os
+d=json.load(sys.stdin)
+rd=os.environ['RD']
+n=0
+for it in d.get('items',[]):
+    if it.get('spec',{}).get('resourceDefinitionName')!=rd: continue
+    flags=it.get('spec',{}).get('flags',[]) or []
+    if 'DISKLESS' in flags or 'TIE_BREAKER' in flags: continue
+    n+=1
+print(n)" 2>/dev/null || echo 0)
+    [[ "$diskful" -ge 1 ]] && break
+    sleep 2
+done
+
+if [[ "$diskful" -lt 1 ]]; then
+    echo "FAIL (D1): deferred spawn placed ZERO diskful replicas on a cluster that can host $nodes" >&2
+    exit 1
+fi
+if [[ "$diskful" -gt "$nodes" ]]; then
+    echo "FAIL (D1): deferred spawn OVER-placed $diskful diskful on a $nodes-node cluster" >&2
+    exit 1
+fi
+echo ">> deferred spawn placed $diskful diskful (best-effort, <= $nodes nodes) — OK"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an explicit Status convergence assertion for placed replicas.

This block validates count bounds, but it does not assert replica Status convergence (e.g., placed diskful replicas reaching an expected steady state), so the scenario can pass while state is still transient.

As per coding guidelines, tests/e2e/cli-matrix/**: “L6 operator-CLI e2e tests must test real linstor CLI → REST → satellite → DRBD kernel interactions and assert Status convergence under tests/e2e/cli-matrix/.”

Source: Coding guidelines

coderabbitai · 2026-06-13T08:11:04Z

+nodes=3
+diskful=0
+for _ in $(seq 1 15); do
+    diskful=$(kubectl get resources.blockstor.cozystack.io -o json 2>/dev/null \
+        | RD="$RD" python3 -c "import json,sys,os
+d=json.load(sys.stdin)
+rd=os.environ['RD']
+n=0
+for it in d.get('items',[]):
+    if it.get('spec',{}).get('resourceDefinitionName')!=rd: continue
+    flags=it.get('spec',{}).get('flags',[]) or []
+    if 'DISKLESS' in flags or 'TIE_BREAKER' in flags: continue
+    n+=1
+print(n)" 2>/dev/null || echo 0)
+    [[ "$diskful" -ge 1 ]] && break
+    sleep 2
+done
+
+if [[ "$diskful" -lt 1 ]]; then
+    echo "FAIL (D1): deferred spawn placed ZERO diskful replicas on a cluster that can host $nodes" >&2
+    exit 1
+fi
+if [[ "$diskful" -gt "$nodes" ]]; then
+    echo "FAIL (D1): deferred spawn OVER-placed $diskful diskful on a $nodes-node cluster" >&2
+    exit 1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hardcoded nodes=3 makes this test environment-fragile.

Line [132] assumes exactly 3 workers, but Line [69] only requires a minimum of 3. On stands with 4+ workers, a valid placement can exceed 3 and trigger a false failure at Line [154].

Suggested fix

-nodes=3 +nodes=$(kubectl get nodes -l '!node-role.kubernetes.io/control-plane' --no-headers 2>/dev/null \ + | awk '$2 == "Ready"' | wc -l | tr -d ' ') +max_placeable=$(( nodes < 7 ? nodes : 7 )) @@ -if [[ "$diskful" -gt "$nodes" ]]; then - echo "FAIL (D1): deferred spawn OVER-placed $diskful diskful on a $nodes-node cluster" >&2 +if [[ "$diskful" -gt "$max_placeable" ]]; then + echo "FAIL (D1): deferred spawn OVER-placed $diskful diskful (max placeable=$max_placeable, ready nodes=$nodes)" >&2 exit 1 fi -echo ">> deferred spawn placed $diskful diskful (best-effort, <= $nodes nodes) — OK" +echo ">> deferred spawn placed $diskful diskful (best-effort, <= $max_placeable) — OK"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/cli-matrix/rg-c-overcommit-spawn-defers.sh` around lines 132 - 156, The script hardcodes nodes=3 which breaks on clusters with >3 workers; replace the static assignment with a dynamic node count and use that for the upper-bound check. Specifically, change the variable nodes (currently set as nodes=3) to compute the actual number of cluster worker nodes (for example nodes="$(kubectl get nodes --no-headers | wc -l)" or another cluster-appropriate kubectl query), keeping the rest of the logic (diskful counting and the comparisons against nodes) unchanged so the test no longer falsely fails on 4+ node clusters.

coderabbitai · 2026-06-13T08:11:04Z

+# Optional await modifier (any disk_state-shaped await with a `node`):
+#
+#   skip_if_reached: <disk_state>
+#       SKIP the WHOLE workflow cleanly (exit 0, neither PASS nor FAIL)
+#       if rd@node reaches <disk_state> before the awaited `expected`
+#       state. For scenarios that need a transient state which a fast
+#       stand can race past — e.g. the U130 mid-sync rejection needs the
+#       new replica observed SyncTarget, but a FILE_THIN skip-sync stand
+#       seeds it straight to UpToDate (no CRD-observable mid-sync). On
+#       such a stand the scenario is "not exercisable here", not a fault:
+#       `expected: SyncTarget` + `skip_if_reached: UpToDate` turns the
+#       would-be timeout FAIL into a clean SKIP. Teardown still runs.
+#


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align exit-code documentation with SKIP semantics.

skip_if_reached now documents/implements clean SKIP with exit 0, but Line 127 still says exit 0 means full PASS only. Please unify the contract text to avoid ambiguous automation behavior.

Suggested doc patch

# Exit codes: # -# 0 — every step + assertion passed, all invariants hold +# 0 — PASS, or clean SKIP when scenario is not exercisable on this stand # 1 — at least one step failed; details on stderr # 2 — usage / config error

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/operator-harness/replay-runner.sh` around lines 97 - 109, The documentation for the await modifier `skip_if_reached` is inconsistent: it states that `skip_if_reached` performs a clean SKIP (exit 0) but elsewhere the contract text still claims exit 0 means a full PASS; update the contract text so exit code semantics are unified—describe that exit 0 may indicate either PASS or a clean SKIP caused by `skip_if_reached`, and clearly state how automation should distinguish SKIP vs PASS (e.g., a separate SKIP marker/logging or an explicit exit code convention if you prefer a distinct code); make the change in the same doc block referencing `skip_if_reached` and the await-modifier contract so both places use the same wording about exit 0 meaning "PASS or clean SKIP" or switch to a distinct non-zero/zero convention consistently.

Andrei Kvapil (kvaps) and others added 3 commits June 13, 2026 10:57

gemini-code-assist Bot reviewed Jun 13, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Andrei Kvapil (kvaps) merged commit 170751b into main Jun 13, 2026
27 of 28 checks passed

Andrei Kvapil (kvaps) deleted the fix/release-gate-replay-harness branch June 13, 2026 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(harness): fix release-gate replay/cli-matrix triage failures (tiebreaker, quorum, mid-sync, over-commit, ps-cdp)#156

test(harness): fix release-gate replay/cli-matrix triage failures (tiebreaker, quorum, mid-sync, over-commit, ps-cdp)#156
Andrei Kvapil (kvaps) merged 3 commits into
mainfrom
fix/release-gate-replay-harness

Andrei Kvapil (kvaps) commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andrei Kvapil (kvaps) commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verdicts

Changes

Product bug found, tracked separately (NOT in this PR)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading