fix(store): sync ControllerProps with the ControllerConfig CRD (BUG-022)#138
Conversation
…the k8s store The CRD-backed store kept the controller-scope props bag in a process-local map populated once at construction and never synced with the ControllerConfig CRD again. `linstor controller set-property` persisted into Spec.ExtraProps via the REST shim, but reconcilers and the placer read Store.ControllerProps() — so every controller-scope knob set via the CLI (BalanceResourcesInterval / GracePeriod / Enabled, Autoplacer/Weights/*, AllowMixingStoragePoolDriver) was a silent no-op for the running process until a restart. ControllerProps() now reads Spec.ExtraProps through the store's client — the manager's informer-cached client in production, the same freshness mechanism every other kind in the store uses — and Set routes through PatchControllerExtraProps so it converges with concurrent REST writes instead of clobbering them. As a side effect the bag now survives controller restarts. Ships a shared storetest ControllerPropsStore suite pinning the InMemory and CRD-backed implementations to identical semantics, and a regression test that drives the REST write path and asserts a pre-existing store instance observes the change live. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The grace-period resolver used parsePositiveMinutes, which rejects an explicit "0" — a legal upstream value meaning "no grace window" — and silently fell back to the 60-minute default. The new parseNonNegativeMinutes sibling accepts 0 while still rejecting negatives and garbage; BalanceResourcesInterval keeps the positive-only parser (a zero cadence is meaningless). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Reconcilers now read controller-scope props through the cached client's ControllerConfig watch, so the generated manager role needs get/list/watch on controllerconfigs. Marker added on the rebalance reconciler (committed with it); role.yaml regenerated via make manifests. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Warning Review limit reached
More reviews will be available in 29 minutes and 55 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses BUG-022 by modifying the Kubernetes store's ControllerProps implementation to read dynamically from the ControllerConfig CRD instead of using a static process-local map. This ensures that controller-scope configuration changes are observed live without requiring a controller restart. Additionally, the PR introduces parseNonNegativeMinutes to support an explicit 0 value for BalanceResourcesGracePeriod (representing "no grace window") and adds comprehensive unit, integration, and shared store tests. Feedback on the changes points out a potential integer overflow issue in parseNonNegativeMinutes where extremely large values could overflow time.Duration and bypass the grace gate check; adding an upper bound check was suggested to prevent this.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func parseNonNegativeMinutes(s string) (int, bool) { | ||
| if s == "" { | ||
| return 0, false | ||
| } | ||
|
|
||
| n, err := strconv.Atoi(s) | ||
| if err != nil || n < 0 { | ||
| return 0, false | ||
| } | ||
|
|
||
| return n, true | ||
| } |
There was a problem hiding this comment.
The parseNonNegativeMinutes function parses minutes as an integer, but the resulting value is later converted to a time.Duration via time.Duration(m) * time.Minute in rg_rebalance_controller.go. If an extremely large value is supplied (e.g., greater than 153722867 minutes), this multiplication will overflow the maximum representable time.Duration (math.MaxInt64 nanoseconds, or ~292 years), wrapping around to a negative duration. This negative duration would then bypass the grace gate check (grace <= 0) and cause the scheduled rebalance to run immediately.
To prevent this, we should add an upper bound check to ensure the parsed minutes do not overflow time.Duration.
| func parseNonNegativeMinutes(s string) (int, bool) { | |
| if s == "" { | |
| return 0, false | |
| } | |
| n, err := strconv.Atoi(s) | |
| if err != nil || n < 0 { | |
| return 0, false | |
| } | |
| return n, true | |
| } | |
| func parseNonNegativeMinutes(s string) (int, bool) { | |
| if s == "" { | |
| return 0, false | |
| } | |
| n, err := strconv.Atoi(s) | |
| if err != nil || n < 0 || n > 153722867 { | |
| return 0, false | |
| } | |
| return n, true | |
| } |
Problem
linstor controller set-propertypersists into the ControllerConfig CRD'sSpec.ExtraProps, but reconcilers and the placer readStore.ControllerProps(), which in the CRD-backed store was a process-local map populated once at construction and never synced from the CRD afterwards. Every controller-scope knob set via the CLI —BalanceResourcesInterval/GracePeriod/Enabled,Autoplacer/Weights/*,AllowMixingStoragePoolDriver— was a silent no-op for the running process until a controller restart. Empirically: with the interval set to 1 minute via the CLI, the scheduled rebalance still fired at the 5-minute default.Secondary: the grace-period resolver rejected an explicit
BalanceResourcesGracePeriod 0(a legal upstream value meaning "no grace window") and silently fell back to the 60-minute default.Fix
ControllerProps()in the k8s store now readsSpec.ExtraPropsthrough the store's client — the manager's informer-cached client in production, the same freshness mechanism every other kind in the store uses.Setroutes throughPatchControllerExtraProps, so it converges with concurrent REST writes instead of clobbering them. As a side effect the props bag now survives controller restarts.BalanceResourcesGracePeriodis parsed with a new non-negative parser:0is honoured as "no grace window"; negatives and garbage still fall back to the default.BalanceResourcesIntervalkeeps the positive-only parser.get/list/watchoncontrollerconfigs, since reconcilers now read the CRD through the cached client's watch.Tests
storetest.RunControllerPropsStoresuite pins the InMemory and CRD-backed implementations to identical Get/Set semantics (empty-not-nil, replace-not-merge, defensive copies, concurrent Get/Set under-race).TestControllerPropsSeeCRDWritesWithoutRestart(envtest) reproduces the operator-day split exactly: store constructed first, then the REST write path mutates the CRD, then the same store instance must observe the fresh value. Red on the previous implementation, green now.TestRebalanceGracePeriodZeropins that an explicit zero grace period does not gate the scheduled pass; a table test covers the parser boundary between the two minute parsers.TestGroupKWFBalanceResourcesTick(test(integration): make the suite honest under pipefail; pin post-#129/Bug-397 contracts #137) goes green with this fix: red pre-fix (Eventually timed out — the tick stayed at the 5-minute default), passes in ~67s post-fix with the CLI-set 1-minute tick.