test(integration): make the suite honest under pipefail; pin post-#129/Bug-397 contracts#137
Conversation
The mock stamped every Node ONLINE unconditionally, making it impossible to exercise `linstor n lost` honestly: the Bug 111 gate correctly refuses node-lost against an ONLINE satellite, and the machine-readable CLI exits 0 on refusal envelopes, so the lost-cascade tests silently no-op'd and timed out. The knob models the documented n-lost precondition (a satellite that stopped heartbeating). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The pinned linstor-client 1.27.1 (asserted by Group H) has no
`node evacuate --force` flag and no `node evict` verb at all, so
both tests crashed on CLI argument parsing before reaching the wire.
Exercise the force override via PUT /v1/nodes/{node}/evacuate?force=true
and the evict alias via PUT /v1/nodes/{node}/evict directly, asserting
the []ApiCallRc envelope shape both routes must answer (Bug 78 wire).
Node-lost cascade now takes the satellite offline first — the
documented n-lost precondition — instead of silently hitting the
Bug 111 online-refusal and timing out.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…tract TestGroupFRToggleDiskful2DisklessPreservesTieBreaker pinned the pre-#129 invariant (witness preserved at 1 diskful + 1 user-diskless). PR #129 collapsed the auto-witness contract to upstream parity — "witness lives iff exactly 2 diskful replicas" — so the toggle down to 1 diskful now reaps the witness. Rename to ...ReapsTieBreaker and assert the exact 2-replica composition (1 diskful + 1 user-diskless, 0 witnesses), mirroring the L1 spec inversion in internal/controller/ensure_tiebreaker_test.go. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… on conflict SnapShipCrossNode pinned the pre-Bug-397 behaviour where an explicit node_names restore onto a snapshot-less node succeeded and acted as a cross-node ship trigger. Bug 397 (P0 data integrity) now refuses that restore at the API edge — an empty diskful replica could latch UpToDate and be promoted on failover. Rename the subtest to SnapRestoreSnapshotHolderOnly and pin both halves of the new contract: 400 + no target RD for the snapshot-less node, 201 + the BlockstorRestoreFromSnapshot routing prop for a holder node. AutoSnapshotPeriodicTick stamped AutoSnapshot/RunEvery with a single-shot Update that races the live RD reconciler; wrap it in retry.RetryOnConflict. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…'props' key The GET envelope assertion looked for a 'properties' map, but the upstream wire key is 'props' — golinstor's NodeConnection struct (client/connection.go) and blockstor's handler both spell it that way. The test (not the product) diverged from the wire contract. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
witness - HappyPath: pin autoplace to the thin lvm-thin pool the way linstor-csi always does; unpinned placement could land on the thick 'file' pool where the snapshot step is correctly refused by the G5 thick-provider gate (the CLI exits 0 on refusal envelopes, so the failure surfaced as a misleading wait timeout). - NodeLostCascade: take the satellite offline before 'n lost' (Bug 111 gate refuses lost against an ONLINE satellite). - PoolDestroyedDropsFromPlacer: 'rd create' in linstor-client 1.27.1 has no --storage-pool flag; the pool constraint rides on autoplace. - LUKSStackEndToEnd: 1.27.1 spells the flag --passphrase, not --new-passphrase. - LateVD / ReplicasOnSame / ReplicasOnDifferent: exclude the auto-tiebreaker witness (expected on 2-diskful RDs post-#129) from diskful placement asserts it was never part of. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…age pools The k8s store's StoragePools().ListByNode runs a label selector on blockstor.io/node-name — the label pkg/store/k8s stamps on every store-created pool. Fixture pools seeded directly via the CRD client lacked it, so per-node store reads saw zero pools and the `n lost` storage-pool cascade silently deleted nothing, failing TestGroupKWFNodeLostCascade's SP assert. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ug 83 assert
linstor-client 1.27.1's argparse rejects the trailing positional
after two valued options ('too few arguments'); use the
positional-first form Group F already uses. The Bug 83 dead-pool
assert now excludes the auto-tiebreaker witness — a network-only
DISKLESS attachment that consumes no pool and legitimately lands on
the dead-pool node for a 2-diskful RD post-#129.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Fold the PUT + envelope assertion into one helper so bodyclose sees the close in the same scope, wrap client errors inside the RetryOnConflict closure, and reword two comments that tripped godox. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ess race seedRDWithVolume created the RD first, so the RD reconciler's auto-tiebreaker pass (witness iff exactly 2 diskful, post-#129) raced the fixture loop to worker-3 and the seeded third Resource flaked with AlreadyExists. Seeding all three diskful rows before the RD lands means the first reconcile already sees 3 diskful and wants no witness. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR updates integration tests across six test groups to enforce new behaviors: offline satellite simulation for lifecycle testing, witness reaping on replica toggles, snapshot restore contract enforcement, REST API node operations, and placement assertion filtering that excludes auto-tiebreaker witnesses. ChangesIntegration test infrastructure and behavior updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the integration tests to align with upstream LINSTOR client 1.27.1 behavior and address several bug fixes (including Bugs 78, 79, 83, 111, 129, and 397). Key updates include simulating offline nodes for node lost tests, asserting that tie-breakers are reaped when dropping below two diskful replicas, ensuring snapshot restores are refused on non-holder nodes, and updating CLI flags and wire keys (such as using props instead of properties). Additionally, retry-on-conflict logic was added for resource definition updates, and storage pools were labeled to ensure visibility. As there are no review comments, I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What
Makes the Tier-2 integration suite honest after the pipefail fix (#135 exposes the real exit code): triages all 16 tests that are red on
mainat 10fdc2b, fixes every test/harness-side failure, and pins the new post-#129 / post-Bug-397 contracts the old tests contradicted.With this PR the suite is green except for 5 tests that are red because of genuine product bugs (details below). Those are deliberately left untouched — fixing the product or weakening the tests is out of scope for a test-honesty PR; they need their own fixes and will keep the integration job red until then.
Why the suite was red (per-test triage)
Stale pins of changed product contracts (inverted, same way the L1 specs were inverted):
TestGroupFRToggleDiskful2DisklessPreservesTieBreaker→ renamed...ReapsTieBreaker: fix(controller): reap auto-tiebreaker below 2 diskful (1 diskful + 1 diskless) #129 collapsed the auto-witness invariant to "witness lives iff exactly 2 diskful replicas"; the toggle down to 1 diskful now reaps the witness. Asserts the exact 2-replica composition.TestGroupG/SnapShipCrossNode→ renamedSnapRestoreSnapshotHolderOnly: the Bug-397 data-integrity guard now refusesnode_namesrestores onto snapshot-less nodes (an empty replica could latch UpToDate and be promoted). The test now pins both the 400 refusal and the successful holder-node restore with the dispatcher routing prop.TestGroupKWFLateVD,TestGroupKWFReplicasOnSame(+OnDifferent),TestGroupKWFPoolDestroyedDropsFromPlacer: post-fix(controller): reap auto-tiebreaker below 2 diskful (1 diskful + 1 diskless) #129 a 2-diskful RD always carries a diskless TIE_BREAKER witness; the diskful-placement asserts now exclude it (it consumes no pool and sits outside placement constraints).Tests speaking a CLI dialect the pinned linstor-client 1.27.1 (asserted by Group H) does not have:
node evacuate --force/node evictdon't exist in 1.27.1 → the force override and the evict alias are now driven directly over the PUT routes (?force=true,/evict), asserting the[]ApiCallRcenvelope shape that pins the Bug-78 wire contract.rd create --storage-pool,encryption create-passphrase --new-passphrase, and trailing-positionalr cargument order → rewritten in 1.27.1 syntax.Test/harness bugs:
linstor n lostagainst an ONLINE satellite is correctly refused (Bug 111 gate), and the machine-readable CLI exits 0 on refusal envelopes, so both node-lost tests silently no-op'd into timeouts. The satellite mock gained aSimulateNodeOfflineknob and the tests now model the documented precondition (dead satellite) first. (With that fixed,TestGroupKWFNodeLostCascadereaches a real product bug — see below.)blockstor.io/node-namelabel the k8s store'sListByNodeselector requires, making them invisible to then lostSP cascade.TestGroupKWFHappyPathautoplaced without a pool pin and could land on the thickfilepool, where the snapshot step is correctly refused by the thick-provider gate; it now pinslvm-thinthe way linstor-csi always pins a pool.TestGroupG/AutoSnapshotPeriodicTickraced the live RD reconciler with a single-shot Update; now retries on conflict.seedRDWithVolumecreated the RD before its per-node Resources, so the auto-tiebreaker pass raced the fixture loop to worker-3 and the seed flaked with AlreadyExists; the helper now seeds all three diskful rows first.TestGroupI/NodeConnectionSetPropertyasserted apropertiesenvelope key; the upstream wire key (golinstorNodeConnection) isprops— the product was right.Note on scope:
docs/agent-playbook.mdreserves harness edits for coordinated work — this PR makes the two minimal harness changes above (satellite knob, fixture label) because the suite cannot be made honest without them; flagged here for explicit review.Why the local gate run looked green
go test -tags=integration ./tests/integration/...piped into anything (| tee,| tail) reports the pipe's exit status, not go test's — the same masking class #135 fixes in CI. An unpiped run exits 1 locally as well (and withoutKUBEBUILDER_ASSETS/the linstor CLI the suite fails fast rather than skipping).Known product bugs (kept red on purpose)
TestGroupJ/CSICreateVolumeFromClone— the clone endpoint rejects golinstor's legitimateuse_zfs_clonefield (rdCloneRequest+DisallowUnknownFields), breaking CSI clone-from-source. Same wire-compat class as the earlieroverride_props/src_snap_namefix.TestGroupKWFSpawnAndDependentReAutoplace—stripRebalanceAnnotationnils the empty annotations map, and the k8s store'smergeUserAnnotationsIntotreats nil as "don't touch", so therebalance-pendingannotation can never be removed in production and the rebalance pass churns forever.TestGroupKWFBalanceResourcesTick—linstor controller set-propertypersists toControllerConfig.Spec.ExtraProps, but reconcilers and the placer readStore.ControllerProps(), which in the k8s store is a process-local map never synced from the CRD: every controller-scope tuning prop (BalanceResources*, autoplacer weights, AllowMixingStoragePoolDriver) is silently ignored.TestGroupKWFLUKSStackEndToEnd—encryption create-passphrasesucceeds (master passphrase stored in a Secret) but the LUKS RD-create gate only consults theDrbdOptions/EncryptPassphrasecontroller prop, so the upstream CLI unlock flow can never enable LUKS provisioning (and the remediation hint suggests storing a plaintext passphrase in a prop).TestGroupKWFNodeLostCascade— controller-side TOCTOU: aftern lostcascades a 3-diskful RD down to 2 diskful,ensureTiebreakerpicks the witness node from the manager's (lagging) cache, re-creates aDISKLESS TIE_BREAKERResource on the just-deleted node, and nothing ever reaps it — the(rd, node)slot the cascade freed is re-occupied by a ghost row pinned to a nonexistent node. The REST resource-create path has the Bug-174 node-deleted-race guard; the controller-side witness create does not. Deterministic whenever the controller process is warm (any prior test in the same binary); passes only when the test runs first.TestGroupI/ResourceConnectionPathCreatefailed once in the run-27381332316 CI job but passes locally in full-suite runs; treated as a CI-timing flake to observe on this PR's CI.Validation
go test -tags=integration -count=1 -timeout=15m ./tests/integration/...(CI invocation,set -o pipefail+tee): all tests pass except the 5 product-bug pins above.go test ./...: green.golangci-lint run: 0 issues (also no new issues under--build-tags integrationvs themainbaseline).Summary by CodeRabbit
Tests