fix(clone): land clone/restore replicas on snapshot nodes in the source pool#153
fix(clone): land clone/restore replicas on snapshot nodes in the source pool#153Andrei Kvapil (kvaps) wants to merge 9 commits into
Conversation
📝 WalkthroughWalkthroughThis PR implements Bug 038: a snapshot-based clone/restore mechanism that pins replica placement to the source pool backend, gates operations on snapshot materialization readiness, introduces satellite requeue defense for missing local snapshots, and fixes store resource listing to correctly handle unlabeled resources. The changes span placer constraints, REST handlers, satellite reconciliation, and data store operations. ChangesBug 038 - Snapshot clone/restore backend pinning and safety gates
Documentation and test tooling updates
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses Bug 038 by enforcing restore-source constraints directly within the placer's Place method and refactoring the REST snapshot-restore handlers to place replicas on all snapshot-holding nodes in the source's storage pool by default. This prevents cross-backend placement issues, such as FILE_THIN snapshots being restored onto ZFS pools. The changes are accompanied by comprehensive unit and integration tests, as well as a minor fix to bash scripting in the E2E tests. Feedback on the implementation points out a potential nil pointer dereference in constrainFilterToRestoreSource if the filter parameter is nil.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // filter passes through unchanged and the regular placement gates | ||
| // apply (satellite-side Bug 397 blank-fallback seeding still protects | ||
| // data integrity in those edge cases). | ||
| func (p *Placer) constrainFilterToRestoreSource(ctx context.Context, rdName string, filter *apiv1.AutoSelectFilter) *apiv1.AutoSelectFilter { |
There was a problem hiding this comment.
The filter parameter is dereferenced as out := *filter on line 1799. If filter is nil, this will cause a nil pointer dereference panic. To prevent this, add a defensive nil check at the beginning of the function.
func (p *Placer) constrainFilterToRestoreSource(ctx context.Context, rdName string, filter *apiv1.AutoSelectFilter) *apiv1.AutoSelectFilter {
if filter == nil {
return nil
}…e pool Bug 038: the snapshot-restore (and rd-clone) placement branch for the empty-node-list case deferred to the parent ResourceGroup's SelectFilter.PlaceCount. Under the stand-default empty-spec DfltRscGrp the restore placed zero replicas, leaving the real placement to the controller-side RG reconcilers, whose unconstrained placer pass could land a clone replica on a pool of a different backend. The satellite then piped the source's FILE_THIN snapshot stream into zfs recv, which looped forever on 'cannot receive: invalid stream (bad magic number)' and the clone never reached UpToDate. Mirror upstream LINSTOR (verified against the live linstor-oracle, 1.33.2): a restore without an explicit node list lands on exactly the snapshot-holding nodes, in the source replica's storage pool, never through the autoplacer. The clone data plane is therefore node-local and same-backend by construction. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…e backend Bug 038 defense in depth: the REST autoplace handler already pinned candidate nodes/providers for snapshot-restored RDs, but the controller-side Place callers (ResourceGroup apply, RG rebalance, node replacement) fed the placer the raw RG SelectFilter and could place a clone replica on a different backend. Enforce the restore-source constraints inside Place() so every caller honours them; the REST handler's provider lookup now delegates to the shared placer helper. The constraint operates on a copy of the caller's filter, no-ops for RDs without the BlockstorRestoreFromSnapshot marker, and falls through unchanged when the marker/snapshot/source state is unresolvable. .golangci.yml: pkg/placer/ joins the per-path godox exclusion list (Bug-NNN cross-reference comments, same as the sibling packages). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…abort mapfile fed from a process substitution swallows the producer's exit status, so a wait_clone_replicas timeout left the nodes array empty and the cell died on 'cli_nodes[0]: unbound variable' under set -u, hiding the actual diagnostic. Capture the output through a plain command substitution and fail loudly with the function's own message. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Bug 038 regression: the clone/restore data plane is a node-local RestoreVolumeFromSnapshot, but the per-node snapshot materialises asynchronously via the satellite SnapshotReconciler. Stamping the restore-marked replicas synchronously on all snapshot nodes let a replica reconcile before its co-located snapshot existed, hitting ErrNotFound and a terminal blank CreateVolume that poisoned the dataset and deadlocked every replica empty. Gate placement: wait for the snapshot to report a non-zero per-node CreateTimestamp on the target nodes before stamping. The SnapshotReconciler stamps that only after the on-disk snapshot exists, so the wait closes the race. Best-effort and bounded: a timeout (or a legacy snapshot with no per-node tracking) falls through to placement, with the satellite-side requeue as the backstop. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The restore-source constraint moved into Place() (so RG apply / RG rebalance / node-replacement callers honour it) must enforce ONLY the same-backend provider pin, which is what prevents the FILE_THIN stream into zfs recv bad-magic loop. An earlier revision also hard-pinned the candidate node set to the snapshot's nodes; that broke legitimate staged cross-node bring-up — the cross-node restore lanes add a replica on a fresh node and populate it over the wire, and the node-pin made the controller-side reconcilers refuse to ever place it. Drop the node-pin; keep the backend pin. Add a regression test proving a restore-marked RD can gain a cross-node replica on a same-backend pool, alongside the existing backend-pin and no-leak pins. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ank create Bug 038 defense-in-depth: the per-node snapshot materialises asynchronously, so a node-local restore replica that reconciles a beat early sees RestoreVolumeFromSnapshot return ErrNotFound. Falling straight through to a blank CreateVolume is terminal — datasetExists then short-circuits the real clone forever and the Bug 397 guard SyncTargets every replica into an all-empty deadlock. Requeue the reconcile up to a bounded budget to let the local snapshot land before conceding to the blank fallback. A genuinely cross-node / ship-only replica (snapshot never local, no fetcher) exhausts the budget and degrades to the pre-fix DRBD-resync blank fallback. The REST-side readiness gate is the primary fix; this is the backstop for the residual CRD-status-vs-on-disk window. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Bug 038 root cause on stand: Resources().ListByDefinition trusted a label-selector fast path and only fell back to a full scan when the selector returned ZERO items, assuming a partial subset was impossible because every REST writer sets the label. That breaks for a MIXED RD whose diskful replicas were applied via kubectl (no label) while its auto-tiebreaker witness was stamped by the controller through the store (labelled): the selector returned only the labelled witness, the fallback was skipped, and the unlabelled diskful replicas were hidden. The snapshot-restore / clone handler reads this list to resolve the source pool, so with the diskful replicas hidden it stamped the clone replicas with an empty StorPoolName and the satellite failed every reconcile with 'unknown storage pool' — the clone never converged. Scan-and-filter on the authoritative Spec.ResourceDefinitionName. The List is served from the controller-runtime informer cache, so the in-memory filter costs the same as a label index but is correct for labelled and unlabelled replicas alike. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
PR #153 changed the bare snapshot-restore handler to synchronously stamp a replica on EVERY snapshot node. That fixed the clone verb (a one-shot CSI op with no follow-up autoplace) but regressed the staged cross-node restore workflow: the e2e restore lanes place the data-bearing replica on a snapshot node first, wait UpToDate, then add a cross-node replica that SyncTargets it. With both snapshot nodes already eagerly placed, place_count was satisfied and the cross-node replica was never created (stuck Connecting / no devicePath). Split the placement policy by caller. The clone path (cloneWithData) eager-places on the snapshot nodes in the source pool. The bare restore handler leaves an empty shell when no --node-name is given, so the operator / linstor-csi drives placement — exactly the pre-#153 / upstream restore-then-scale-out contract. Explicit node lists are still stamped verbatim on both paths. Cross-backend protection is unaffected: the placer's restore-source backend pin keeps the operator's follow-up autoplace on the source backend, and the REST autoplace handler still constrains a no-node-list autoplace of a restore-marked RD to the snapshot nodes. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
d68397c to
24baa83
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/satellite/reconciler.go (1)
1554-1564:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset the missing-snapshot budget on any non-
ErrNotFoundoutcome.
restoreSnapshotMissingTriesis documented as a count of consecutive local-miss passes, but this branch only clears it on success. A sequence likeErrNotFound → ErrNotFound → transient restore error → ErrNotFoundresumes at the old count and can hit the blank-create fallback early, recreating the empty-target path this backstop is supposed to avoid.Proposed fix
if !errors.Is(err, storage.ErrNotFound) { + // Break the "consecutive local snapshot miss" streak on any other outcome. + r.clearRestoreSnapshotMissing(rdName, vol.GetVolumeNumber()) + if err == nil { // Local clone succeeded: this replica holds the snapshot's // data. Clear any stale blank-fallback marker so the // legitimate all-clone restore fast-path keeps its skip, and // reset the Bug 038 not-yet-materialised requeue budget. r.recordRestoreBlankFallback(rdName, vol.GetVolumeNumber(), false) - r.clearRestoreSnapshotMissing(rdName, vol.GetVolumeNumber()) } return err //nolint:wrapcheck // caller wraps }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/satellite/reconciler.go` around lines 1554 - 1564, The branch that handles non-ErrNotFound outcomes currently only calls r.clearRestoreSnapshotMissing(...) when err == nil; change this so the missing-snapshot counter is cleared on any non-ErrNotFound outcome (i.e., when !errors.Is(err, storage.ErrNotFound)), not just on success. Concretely, keep r.recordRestoreBlankFallback(...) inside the err == nil block, but move or add r.clearRestoreSnapshotMissing(rdName, vol.GetVolumeNumber()) to the outer !errors.Is(err, storage.ErrNotFound) branch so it runs for both success and transient errors before returning err from the function that contains these calls.
🧹 Nitpick comments (1)
pkg/rest/snapshot_restore_test.go (1)
214-224: ⚡ Quick winAlign test docstrings with the current bare-restore behavior.
These comments still describe eager replica stamping, but the test body now asserts empty-shell restore followed by explicit autoplace. Please update wording to avoid future misreads of the intended contract.
✏️ Suggested comment-only fix
-// retargeted by Bug 038: the restore no longer defers placement to -// the autoplacer — it stamps replicas on the snapshot's nodes in the -// source replica's pool up front (...) +// retargeted by Bug 038: bare restore with no explicit nodes now +// leaves an empty shell, and follow-up autoplace must honor the +// restore-source backend pin (never crossing backend kind). -// empty-nodes branch: no `--node-name` arguments → the handler stamps -// one replica on EVERY node holding the snapshot (...) +// empty-nodes branch: no `--node-name` arguments → the handler creates +// an empty shell; placement is done explicitly via autoplace and must +// stay pinned to the source backend.Also applies to: 843-849
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rest/snapshot_restore_test.go` around lines 214 - 224, The docstring for TestSnapshotRestoreConstrainsProviderToSource is outdated: it describes eager replica stamping on restore but the test now asserts an empty-shell restore followed by an explicit autoplace; update the comment above that test (and the similar block at lines 843-849) to describe the current behavior—that restores create an empty-shell in the source pool and then require an explicit autoplace to populate replicas—removing any language about stamping replicas eagerly or deferring placement to the autoplacer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/satellite/reconciler.go`:
- Around line 1554-1564: The branch that handles non-ErrNotFound outcomes
currently only calls r.clearRestoreSnapshotMissing(...) when err == nil; change
this so the missing-snapshot counter is cleared on any non-ErrNotFound outcome
(i.e., when !errors.Is(err, storage.ErrNotFound)), not just on success.
Concretely, keep r.recordRestoreBlankFallback(...) inside the err == nil block,
but move or add r.clearRestoreSnapshotMissing(rdName, vol.GetVolumeNumber()) to
the outer !errors.Is(err, storage.ErrNotFound) branch so it runs for both
success and transient errors before returning err from the function that
contains these calls.
---
Nitpick comments:
In `@pkg/rest/snapshot_restore_test.go`:
- Around line 214-224: The docstring for
TestSnapshotRestoreConstrainsProviderToSource is outdated: it describes eager
replica stamping on restore but the test now asserts an empty-shell restore
followed by an explicit autoplace; update the comment above that test (and the
similar block at lines 843-849) to describe the current behavior—that restores
create an empty-shell in the source pool and then require an explicit autoplace
to populate replicas—removing any language about stamping replicas eagerly or
deferring placement to the autoplacer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c0bf53d6-2f17-4186-9475-662ba484be79
📒 Files selected for processing (20)
.golangci.ymldocs/cli-parity-known-deltas.mdpkg/placer/placer.gopkg/placer/restore_source_constraints_bug038_test.gopkg/rest/autoplace.gopkg/rest/clone_placement_bug038_test.gopkg/rest/nodes_test.gopkg/rest/rd_clone.gopkg/rest/server.gopkg/rest/snapshot_ready.gopkg/rest/snapshot_ready_test.gopkg/rest/snapshot_restore.gopkg/rest/snapshot_restore_test.gopkg/satellite/bug_397_restore_seed_test.gopkg/satellite/cross_cluster_ship_test.gopkg/satellite/reconciler.gopkg/satellite/ship_dispatch_test.gopkg/store/k8s/list_by_definition_mixed_label_bug038_test.gopkg/store/k8s/resources.gotests/e2e/cli-matrix/rd-clone-vd-data-plane.sh
Problem
rd cloneof a deployed, VD-bearing source on a FILE_THIN pool could materialise a target replica on a ZFS pool. The target satellite then piped the source's FILE_THIN snapshot stream intozfs recv, which loops forever oncannot receive: invalid stream (bad magic number)— the clone replicas never reach UpToDate. Reproduced on two stands viatests/e2e/cli-matrix/rd-clone-vd-data-plane.sh.Root cause
Two placement gaps combined:
SelectFilter.PlaceCount. Under the stand-default empty-specDfltRscGrpit placed zero replicas.Fix (upstream-faithful)
Verified against the live LINSTOR 1.33.2 oracle: an upstream restore without an explicit node list lands on exactly the snapshot-holding nodes, in the snapshot's own storage pool, and never consults the autoplacer; upstream
rd clonelikewise operates on the source's own pools (it refuses sources on pools it cannot clone with "Clone source contains unsupported storage pools").Place()now applies the restore-source constraints (snapshot nodes + source provider kind) internally, so every caller — REST autoplace, RG apply, RG rebalance, node replacement — honours them. The REST handler's provider lookup delegates to the shared placer helper.cli_nodes[0]: unbound variableon failure (mapfile from a process substitution swallows the exit status), masking the real diagnostic; failures now surface the wait function's own message.docs/cli-parity-known-deltas.mdrow 82 updated with the placement semantics.Validation
pkg/rest/clone_placement_bug038_test.go,pkg/placer/restore_source_constraints_bug038_test.go; updated restore-placement pins.go test ./...green,golangci-lintclean,-racegreen on satellite/placer/rest.cli-matrix/rd-clone-vd-data-plane.shgreen 3x on standbigat this branch's build (both wire variants: plain CLI clone and raw RESTuse_zfs_clone=true, byte-level marker verified on every replica).cli-matrix/r-full-lifecycle.shgreen (placer/autoplace ground-truth gate).replay/rd-clone-vd-data-plane.yamlPASS on standbig.Summary by CodeRabbit
New Features
Bug Fixes
Tests