Skip to content

fix(controller): rebalance-pending annotation strip never reached the CRD#139

Merged
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/rebalance-annotation-strip
Jun 12, 2026
Merged

fix(controller): rebalance-pending annotation strip never reached the CRD#139
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
fix/rebalance-annotation-strip

Conversation

@kvaps

Copy link
Copy Markdown
Member

Problem

The blockstor.io/rebalance-pending RG annotation could never be removed in production: once the rebalance pass completed, the strip helper nil-ed the wire annotation map when it became empty, but the k8s store's annotation merge treats a nil wire map as "annotations untouched" (deliberately, to protect reconciler-stamped keys). The deletion therefore never reached the CRD and the rebalance pass re-fired on every RG event — observed as the annotation persisting indefinitely with constant placer churn. The InMemory store replaced rows wholesale, which is why unit tests never caught it. The blockstor.io/spawn-shortfall strip on RDs had the identical defect.

Fix

  • Store contract made explicit and uniform: on Update/Patch, a nil wire Annotations map means "leave stored annotations untouched"; a non-nil map — including an empty one — means "replace the user-annotation set with exactly this set". The k8s backend already behaved this way; the InMemory backend now carries stored annotations across nil wire maps in the RG, RD, Resource and Snapshot Update/Patch paths, so both backends are behaviourally identical. The contract is documented on the store interfaces.
  • Strip helpers fixed: stripRebalanceAnnotation and stripShortfallAnnotation keep the emptied map non-nil so the deletion propagates. The wire JSON shape is unchanged (annotations,omitempty elides empty and nil alike).

Tests

  • Store-contract pins (pkg/store/storetest): UpdateAnnotationContract subtests cover all three transitions (nil = untouched, non-nil = replace, empty = clear) for RG, RD, Resource and Snapshot, and run against both backends. Red against the pre-fix InMemory store.
  • L1 envtest regression (internal/controller): drives the real strip paths through the CRD-backed store against envtest and asserts the markers are gone from the CRD — a lone rebalance-pending annotation stripped via Reconcile, a multi-key strip that must keep sibling annotations, and the spawn-shortfall strip. The lone-marker specs fail without the fix.

Verified locally: go test ./..., go test -race ./internal/controller/... and golangci-lint run are clean; the operator-level integration pin TestGroupKWFSpawnAndDependentReAutoplace (PR #137 branch) goes red on main and green with this fix.

Andrei Kvapil (kvaps) and others added 4 commits June 12, 2026 04:44
The k8s store's mergeUserAnnotationsInto treats a nil wire-side
annotation map as "leave stored annotations untouched" and a
non-nil (including empty) map as "replace the user-annotation set".
The InMemory store replaced rows wholesale on Update/Patch, so the
two backends silently diverged and production-only behaviour was
invisible to unit tests. Carry stored annotations across nil wire
maps in the RG, RD, Resource and Snapshot Update/Patch paths and
document the contract on the store interfaces.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… CRD

stripRebalanceAnnotation and stripShortfallAnnotation nil-ed the
annotation map once it became empty, but the store Update contract
reads a nil wire map as "annotations untouched", so deleting the
last marker never propagated to the CRD and the rebalance pass
re-fired on every event. Keep the emptied map non-nil so the strip
persists; the wire JSON shape is unchanged because the annotations
field is omitempty either way.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Cover the three transitions (nil = untouched, non-nil = replace,
empty = clear) for RG, RD, Resource and Snapshot stores so the
InMemory and CRD-backed implementations cannot diverge again.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Drive the real strip paths through the CRD-backed store against
envtest and assert the markers are gone from the CRD: a lone
rebalance-pending annotation stripped via Reconcile, a multi-key
strip that must keep sibling annotations, and the spawn-shortfall
strip on an RD. The lone-marker specs fail without the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 28 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e11223fb-7812-4e56-9b14-c33d93f97bc3

📥 Commits

Reviewing files that changed from the base of the PR and between 910ea3a and ce18470.

📒 Files selected for processing (10)
  • internal/controller/rg_rebalance_controller.go
  • internal/controller/rg_rebalance_strip_bug021_test.go
  • pkg/store/inmemory.go
  • pkg/store/inmemory_resource.go
  • pkg/store/inmemory_resource_definition.go
  • pkg/store/inmemory_resource_group.go
  • pkg/store/inmemory_snapshot.go
  • pkg/store/store.go
  • pkg/store/storetest/annotation_contract.go
  • pkg/store/storetest/storetest.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rebalance-annotation-strip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses Bug-021, where deleting the last remaining annotation on a resource would silently fail because setting the annotation map to nil was interpreted by the Kubernetes backend as 'annotations untouched'. The changes ensure that empty annotation maps are kept non-nil during stripping operations, and the in-memory store implementation is updated to match this contract. Comprehensive integration and unit tests have been added to verify this behavior. The reviewer's feedback is highly constructive, pointing out that since Go maps are reference types, we should clone the annotations using maps.Clone when carrying them over or before passing them to mutator functions to prevent accidental in-place mutations from leaking.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/store/inmemory.go
Comment on lines +74 to +80
func carryAnnotationsOnNil(next, prev map[string]string) map[string]string {
if next == nil {
return prev
}

return next
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent accidental mutations of the stored annotations from leaking or being shared across different updates/retries, we should clone the prev map using maps.Clone when carrying it over. Additionally, we can define a package-level helper cloneAnnotations to safely clone annotations in other store files without needing to import maps everywhere.

Suggested change
func carryAnnotationsOnNil(next, prev map[string]string) map[string]string {
if next == nil {
return prev
}
return next
}
func carryAnnotationsOnNil(next, prev map[string]string) map[string]string {
if next == nil {
return maps.Clone(prev)
}
return next
}
func cloneAnnotations(m map[string]string) map[string]string {
return maps.Clone(m)
}

return errors.Wrapf(ErrNotFound, "resource %q on node %q", rdName, node)
}

prevAnnotations := r.Annotations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since r.Annotations is a map (reference type), passing &r to mutate allows the mutator to modify the map in-place. If the mutator mutates the map and then sets r.Annotations = nil, prevAnnotations (which points to the same map) will also reflect those mutations, violating the 'untouched' contract. Cloning the annotations before calling mutate ensures prevAnnotations remains a true, untouched snapshot of the original annotations.

Suggested change
prevAnnotations := r.Annotations
prevAnnotations := cloneAnnotations(r.Annotations)

return errors.Wrapf(ErrNotFound, "resource definition %q", name)
}

prevAnnotations := rd.Annotations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since rd.Annotations is a map (reference type), passing &rd to mutate allows the mutator to modify the map in-place. If the mutator mutates the map and then sets rd.Annotations = nil, prevAnnotations (which points to the same map) will also reflect those mutations, violating the 'untouched' contract. Cloning the annotations before calling mutate ensures prevAnnotations remains a true, untouched snapshot of the original annotations.

Suggested change
prevAnnotations := rd.Annotations
prevAnnotations := cloneAnnotations(rd.Annotations)

return errors.Wrapf(ErrNotFound, "resource group %q", name)
}

prevAnnotations := rg.Annotations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since rg.Annotations is a map (reference type), passing &rg to mutate allows the mutator to modify the map in-place. If the mutator mutates the map and then sets rg.Annotations = nil, prevAnnotations (which points to the same map) will also reflect those mutations, violating the 'untouched' contract. Cloning the annotations before calling mutate ensures prevAnnotations remains a true, untouched snapshot of the original annotations.

Suggested change
prevAnnotations := rg.Annotations
prevAnnotations := cloneAnnotations(rg.Annotations)

@kvaps Andrei Kvapil (kvaps) merged commit c2026a6 into main Jun 12, 2026
27 of 28 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the fix/rebalance-annotation-strip branch June 12, 2026 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant