fix(rest): absorb informer-cache lag across read-after-write REST paths by kvaps · Pull Request #151 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-12T22:42:44Z

Problem

The REST server reads through the manager's informer-cached client while writes go straight to the apiserver. Any read that immediately follows a create — same wire client, sub-millisecond gap — can therefore be served by a cache that has not observed the write yet, surfacing a spurious 404 for an object that demonstrably exists. The mitigation pattern already exists (pkg/rest/cache_retry.go: RG/RD helpers, later extended to the snapshot/CSI-restore hot path), but it covered only the paths that had already failed in CI.

The latest victim was TestGroupB/PhysicalStorageCDPPropsPerKind/file-thin (status: got 404, want 202): the POST /v1/physical-storage/{node} handler matched the requested device paths against a device list served by a cache still trailing the discovery loop's PhysicalDevice write, found no match, and 404'd a valid create-device-pool request.

What changed

Instead of fixing one more endpoint, this PR closes out the whole class. Every store read in pkg/rest was enumerated and classified; the cache-retry pattern (3 attempts, 200 ms apart — a real 404 still surfaces after the budget, steady-state hits pay zero extra latency) is now applied uniformly to the operator-visible read-after-write paths:

Physical storage CDP — the device-path match retries while no device matches and none is busy (the failing case). A busy device (409) returns immediately since it is observably in the cache.
Nodes — single GET after register; the re-register upsert fall-through (Create returned AlreadyExists — authoritative proof of existence — so a NotFound from the immediate patch read is definitionally cache lag).
Storage pools — single GET after create; the node-existence probe on pool create (node registered and pools created in the same reconcile pass).
Volume definitions — create probe, list probe and single GET (linstor-csi POSTs RD and VD back-to-back; VDs live inline on the RD, so a stale RD revision also reads as NotFound).
Resource definitions — the RG gates on RD create, including the post-write re-check that would otherwise mistake cache lag for a concurrent RG delete and roll back a freshly created RD.
Resources — per-node GET and per-resource volume reads after autoplace / spawn / make-available.
Snapshots — list probe and the create-side source-RD read (the snapshot GET/restore reads were already mitigated).
Query-size-info — the per-RG capacity preflight that runs right after the RG is ensured.

Deliberately left alone (staleness harmless or NotFound is the contract): pure list endpoints (absence renders as an empty list), delete-path idempotency probes, NotFound-is-normal upsert probes, best-effort soft-fail reads, the process-local KV store and storage-pool-definition registry, and the controller-props bag (a missing singleton folds into an empty map; no 404 can surface).

The three pre-existing helpers were deduplicated behind a generic core so all helpers share the exact same budget and error-wrapping semantics.

Testing

New L1 regressions extend the lag-store pattern: the CDP device-list path end-to-end on the wire (202 under cache miss, attach lands on the real device), the genuine-no-match 404 after the budget, the busy-device 409 short-circuit, and the node re-register patch path.
TestGroupB green 3×; full integration suite (-tags=integration, envtest 1.34.1, linstor-client 1.27.1) green.
go test ./... green; go test -race ./pkg/rest green; golangci-lint run — 0 issues.

Summary by CodeRabbit

New Features
- Added generic cache-retry mechanism to handle informer cache synchronization lag.
Bug Fixes
- Reduced spurious 404 errors during create-then-read operations across resource, snapshot, volume definition, storage pool, and node endpoints.
- Improved resilience in high-frequency operations by retrying cache lookups during synchronization windows.
Tests
- Added comprehensive tests for cache-miss scenarios covering device attachment, node re-registration, and resource reads.

The RG / RD / Snapshot cache-retry helpers carried three identical copies of the retry-on-NotFound loop. Fold them into a single generic getWithCacheRetry core so upcoming helpers for other object kinds reuse the exact same budget and error-wrapping semantics instead of growing a fourth copy. No behaviour change: same attempts, delay, error wrapping and context-cancellation handling. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

The REST server reads through the manager's informer-cached client while writes go straight to the apiserver, so any read that immediately follows a create can hit a cache that has not observed the write yet and surface a spurious 404. The CDP path (POST /v1/physical-storage/{node}) was the latest victim: the device-list match ran against a cache still trailing the discovery loop's PhysicalDevice write and 404'd a valid create-device-pool. Sweep the remaining members of the class and apply the existing cache-retry pattern (3 attempts, 200 ms apart; a real 404 still surfaces after the budget): - physical-storage CDP device match (the failing case) - node single GET and the storage-pool create node probe (register node then read/create pools in one reconcile pass) - node re-register upsert fall-through (Create says AlreadyExists, so a NotFound from the immediate patch read is cache lag) - storage-pool single GET (create then read-back) - volume-definition create probe, list probe and single GET (linstor-csi POSTs RD and VD back-to-back) - RD-create resource-group gates, including the post-write re-check that would otherwise roll back a freshly created RD when the RG was created moments earlier - per-node resource GET and per-resource volume reads (autoplace / make-available then immediate read) - snapshot list probe and snapshot-create source-RD read - query-size-info RG read (capacity preflight after RG ensure) Pure list endpoints, delete-path idempotency probes, the process-local KV store / storage-pool-definition registry and the controller-props bag (missing folds into an empty map, no 404) are deliberately left alone. Regression coverage extends the lag-store pattern to the physical-device list and the node re-register patch path. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-12T22:43:02Z

📝 Walkthrough

Walkthrough

This PR centralizes cache-retry logic by introducing a generic getWithCacheRetry[T] helper in cache_retry.go, refactoring all existing cache-retry functions to use it, and applying retry-aware reads across REST handlers to tolerate informer-cache lag during resource creation hot paths. Comprehensive test infrastructure and test cases verify cache-miss scenarios.

Changes

Informer Cache Lag Resilience Across REST Handlers

Layer / File(s)	Summary
Generic cache-retry infrastructure and refactored helpers `pkg/rest/cache_retry.go`	Introduces `getWithCacheRetry[T]` generic retry loop with fixed delay, attempt budgets, and context-cancellation handling. Refactors all existing cache-retry helpers (RG, RD, Snapshot, Node, StoragePool, VolumeDefinition, Resource) and `patchNodeSpecWithCacheRetry` to delegate to this central helper, eliminating duplicate retry logic. Updates `pickCDPDevicesWithCacheRetry` to use centralized retry semantics while preserving device-selection behavior.
Test infrastructure and cache-lag simulators `pkg/rest/cache_retry_test.go`	Adds `lagPhysicalDeviceStore` to hide a device from `ListForNode()` for configured calls, and `lagNodePatchStore` to return `ErrNotFound` from `PatchNodeSpec()` for initial patch attempts. Extends `flakyStore` with `PhysicalDevices()` and `Nodes()` methods to route calls through lag wrappers.
End-to-end cache-lag scenario tests `pkg/rest/cache_retry_test.go`	Adds four test cases: `TestPhysicalStorageCDP_SurvivesCacheMissOnDeviceList` (device hidden initially, attach succeeds after retry), `TestPickCDPDevicesWithCacheRetry_RealNoMatchStillEmpty` (non-existent device retries then returns empty), `TestPickCDPDevicesWithCacheRetry_BusyDeviceReturnsImmediately` (busy device detected in cache, returns immediately), `TestNodeReRegister_SurvivesCacheMissOnPatch` (node patch initially NotFound, second registration succeeds).
ResourceGroup and ResourceDefinition validation with cache-retry `pkg/rest/resource_definitions.go`, `pkg/rest/query_size_info.go`	Applies `getRGWithCacheRetry` to RG existence checks in `refuseRDCreateOnRGDeletedRace` (rollback-on-concurrent-deletion) and `refuseRDCreateOnUnknownRG` (parent RG gate). Also applies `getRGWithCacheRetry` to RG lookup in `handleQuerySizeInfo` for capacity-query hot path with added documentation of cache-staleness scenario.
Node read and upsert patch operations with cache-retry `pkg/rest/nodes.go`	Updates `handleNodeGet` to use `getNodeWithCacheRetry` for post-create GET hot path. Updates node-upsert conflict-recovery in `upsertNodeAndDiskless` to route `PatchNodeSpec` through `patchNodeSpecWithCacheRetry` when `Create` returns `ErrAlreadyExists`.
StoragePool, Node, and PhysicalDevice cache-retry `pkg/rest/storage_pools.go`, `pkg/rest/physical_storage.go`	Applies `getStoragePoolWithCacheRetry` to `handleNodeStoragePoolGet` for pool-create hot path. Applies `getNodeWithCacheRetry` to `handleNodeStoragePoolCreate` to retry before failing. Applies `pickCDPDevicesWithCacheRetry` to `handlePhysicalStorageCreate` for device-target selection with retry-on-lag semantics.
Snapshot list and hydration with RD cache-retry `pkg/rest/snapshots.go`	Updates `handleSnapshotList` to check parent RD existence via `getRDWithCacheRetry` for create-then-list hot path. Updates `hydrateSnapshotFromRD` to fetch source RD via `getRDWithCacheRetry` before populating snapshot fields, ensuring hydration succeeds when cache trails RD writes.
VolumeDefinition list, get, and create with RD cache-retry `pkg/rest/volume_definitions.go`	Applies `getRDWithCacheRetry` to parent RD existence check in `handleVDList`. Applies `getVDWithCacheRetry` to specific VD fetch in `handleVDGet`. Adds early RD existence check via `getRDWithCacheRetry` in `handleVDCreate` before request-body decoding.
Resource and Autoplace read cache-retry `pkg/rest/autoplace.go`	Updates `handleResourceGet` to use `getRDWithCacheRetry` for RD existence check and `getResourceWithCacheRetry` for per-node Resource fetch, absorbing cache lag on CreateVolume/Attach hot path. Response redaction, annotation-stripping, and volume-wrapping logic unchanged.
Per-resource Volume list and get cache-retry `pkg/rest/volumes_per_resource.go`	Updates both `handleVolumesPerResourceList` and `handleVolumesPerResourceGet` to fetch target Resource via `getResourceWithCacheRetry`, tolerating informer-cache lag during replica-create hot path. List and single-get response behavior (empty slices, typed 404s) unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cozystack/blockstor#150: Both PRs address informer-cache lag on snapshot read-after-write by changing snapshot cache-retry helpers; this PR centralizes all retry logic via getWithCacheRetry while the related PR adds snapshot-specific helpers.
cozystack/blockstor#149: Both PRs update the node upsert conflict-recovery path in upsertNodeAndDiskless to use patch-based updates instead of stale read/update flows when Create returns ErrAlreadyExists.

Poem

🐰 A rabbit hops through caches deep,
Where retries wake and data creep,
With getWithCacheRetry's gentle hand,
Lag is tamed across the land—
No 404s on freshly born,
Just happy reads from dusk to morn! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: applying cache-lag fixes across read-after-write REST paths, which is the primary focus of all file changes in this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/rest-cache-lag-read-after-write

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the cache-retry mechanism by introducing a generic getWithCacheRetry helper and applying it across various REST endpoints (such as nodes, storage pools, volume definitions, resources, and physical storage) to mitigate informer-cache lag and prevent spurious 404 errors on hot paths. The feedback identifies a potential flakiness issue in TestPickCDPDevicesWithCacheRetry_BusyDeviceReturnsImmediately due to a timing-based assertion and suggests a more robust approach by counting the number of store calls instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T22:45:30Z

+	if elapsed := time.Since(start); elapsed >= cacheRetryDelay {
+		t.Fatalf("busy short-circuit took %s, expected < one cacheRetryDelay (%s)", elapsed, cacheRetryDelay)
+	}


The timing-based assertion elapsed >= cacheRetryDelay is prone to flakiness in heavily loaded CI environments where CPU scheduling delays can easily exceed 200ms.\n\nTo make this test 100% robust, consider verifying that the operation returned immediately by counting the number of calls to ListForNode instead.\n\nYou can achieve this by:\n1. Updating lagPhysicalDeviceStore.ListForNode to always increment f.calls (even when f.calls >= f.missBudget).\n2. Wrapping st in flakyStore with missBudget: 0 in this test.\n3. Asserting that calls == 1 after the call.\n\nExample:\ngo\n pds := &lagPhysicalDeviceStore{\n PhysicalDeviceStore: st.PhysicalDevices(),\n missBudget: 0,\n }\n flaky := &flakyStore{\n Store: st,\n pds: pds,\n }\n\n targets, busy, err := pickCDPDevicesWithCacheRetry(t.Context(), flaky, \"n1\", []string{\"/dev/disk/by-id/scsi-busy\"})\n // ...\n pds.mu.Lock()\n calls := pds.calls\n pds.mu.Unlock()\n\n if calls != 1 {\n t.Fatalf(\"expected exactly 1 call, got %d\", calls)\n }\n

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rest/cache_retry_test.go`:
- Around line 693-695: The wall-clock assertion comparing elapsed to
cacheRetryDelay is scheduler-sensitive and flaky; replace it by asserting the
retry path was not executed via a deterministic call-count. Remove the
time.Since(start)/elapsed check and instead track the number of invocations of
the retryable/mock callback (e.g., add an atomic/int counter that the function
under test increments) and assert that counter is 0 (or unchanged) after the
operation; reference the existing symbols elapsed/start/cacheRetryDelay to
locate the timing check and the retry callback used in this test and replace the
timing assertion with the call-count assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 78c91387-3f8d-4fea-9296-b0ad0612b675

📥 Commits

Reviewing files that changed from the base of the PR and between e4b747c and f36adb5.

📒 Files selected for processing (11)

pkg/rest/autoplace.go
pkg/rest/cache_retry.go
pkg/rest/cache_retry_test.go
pkg/rest/nodes.go
pkg/rest/physical_storage.go
pkg/rest/query_size_info.go
pkg/rest/resource_definitions.go
pkg/rest/snapshots.go
pkg/rest/storage_pools.go
pkg/rest/volume_definitions.go
pkg/rest/volumes_per_resource.go

coderabbitai · 2026-06-12T22:51:10Z

+	if elapsed := time.Since(start); elapsed >= cacheRetryDelay {
+		t.Fatalf("busy short-circuit took %s, expected < one cacheRetryDelay (%s)", elapsed, cacheRetryDelay)
+	}


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid scheduler-sensitive wall-clock assertion in busy-device short-circuit test.

Line 693’s elapsed < cacheRetryDelay check can fail on loaded CI runners even when no retry occurs, making this test flaky. Assert behavior via call count instead of elapsed wall time.

Suggested deterministic update

@@ - start := time.Now() - - targets, busy, err := pickCDPDevicesWithCacheRetry(t.Context(), st, "n1", []string{"/dev/disk/by-id/scsi-busy"}) + flaky := &flakyStore{ + Store: st, + pds: &lagPhysicalDeviceStore{ + PhysicalDeviceStore: st.PhysicalDevices(), + hideName: "__do-not-hide__", + missBudget: cacheRetryAttempts, + }, + } + + targets, busy, err := pickCDPDevicesWithCacheRetry(t.Context(), flaky, "n1", []string{"/dev/disk/by-id/scsi-busy"}) if err != nil { t.Fatalf("pickCDPDevicesWithCacheRetry: %v", err) } @@ - if elapsed := time.Since(start); elapsed >= cacheRetryDelay { - t.Fatalf("busy short-circuit took %s, expected < one cacheRetryDelay (%s)", elapsed, cacheRetryDelay) - } + flaky.pds.mu.Lock() + gotCalls := flaky.pds.calls + flaky.pds.mu.Unlock() + if gotCalls != 1 { + t.Fatalf("expected exactly 1 ListForNode call for busy short-circuit, got %d", gotCalls) + }

As per coding guidelines, "**/*_test.go: L1 unit tests must run on every commit using go test ./...", so this assertion should avoid scheduler-dependent timing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rest/cache_retry_test.go` around lines 693 - 695, The wall-clock assertion comparing elapsed to cacheRetryDelay is scheduler-sensitive and flaky; replace it by asserting the retry path was not executed via a deterministic call-count. Remove the time.Since(start)/elapsed check and instead track the number of invocations of the retryable/mock callback (e.g., add an atomic/int counter that the function under test increments) and assert that counter is 0 (or unchanged) after the operation; reference the existing symbols elapsed/start/cacheRetryDelay to locate the timing check and the retry callback used in this test and replace the timing assertion with the call-count assertion.

Source: Coding guidelines

… (release-gate sweep) (#155) * fix(stand): inject csi-sanity mTLS certs as PEM content, not file paths The csi-sanity Job hung for the full harness wait (rc=1, 3600s) on the release-gate sweep. Root cause: golinstor v0.58.0 (vendored by piraeus-csi v1.10.1) reads LS_ROOT_CA / LS_USER_CERTIFICATE / LS_USER_KEY as PEM *content* and feeds the value straight into x509.CertPool.AppendCertsFromPEM / tls.X509KeyPair. The path-based LS_*_FILE variants only exist from golinstor v0.61.0. The manifest passed file paths (/etc/linstor/client/*.crt), so golinstor tried to PEM-decode the literal path string and failed at startup with "failed to get a valid certificate from 'LS_ROOT_CA'". linstor-csi never opened /csi/csi.sock, the csi-sanity container looped on the missing socket and exited, and the Job went Failed (not Complete) — wedging the sweep's `kubectl wait --for=condition=complete` until the 1h timeout. Inject the cert-manager secret keys directly as env values via secretKeyRef so golinstor receives PEM bytes, and drop the now-unused client-tls volume mount. With this fix the Job runs the CSI contract end to end (linstor-csi connects over mTLS to controller:3371). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * docs(cli-parity): whitelist intentional CLI deltas surfaced by refresh The release-gate cli-parity-refresh flagged 12 new non-PARITY rows. Five of them are genuine, intentional blockstor behaviours that reproduce on a freshly-seeded parity fixture (verified live against the dev-stand LINSTOR 1.33.2 oracle), but were not matched by the existing whitelist: - 07 `rd l --resource-definitions <rd>`: the flag-qualified sibling of the `stampRDLayerDataFromStack` Layers-column delta already accepted as row 81; the bare `rd l` whitelist string did not literally cover it. - 33 `s d <rd> <nonexistent-snap>` and 42 `r d <rd> <nonexistent-node>`: blockstor's idempotent-delete envelope (SUCCESS "already absent" vs the upstream WARNING "not found"); required for idempotent CSI/operator retries. - 40 `n c <node> <ip> --node-type Satellite`: satellite-registration success-envelope shape (no inline node UUID; "no active connection" reported as SUCCESS-class rather than WARNING). Node is registered identically; envelope shape only. - 16 `ps l`: BS `/v1/physical-storage` omits the per-device `size` key, crashing python-linstor's renderer. Hardware-discovery convenience surface unused by linstor-csi; whitelisted until the DTO field lands. None of the twelve are regressions from the recent merges (#131-#151). The remaining seven flagged rows are ambient stand-state drift (leftover `test`/`testrg` resource-groups on BS, a stale `orc-tbtest` resource plus old error-reports on the oracle) and are deliberately NOT whitelisted — they return to PARITY once the stand is cleaned, and masking them would blind the gate to real regressions on the core list commands. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> --------- Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

Andrei Kvapil (kvaps) and others added 2 commits June 13, 2026 01:29

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

Andrei Kvapil (kvaps) merged commit 2017184 into main Jun 12, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rest): absorb informer-cache lag across read-after-write REST paths#151

fix(rest): absorb informer-cache lag across read-after-write REST paths#151
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/rest-cache-lag-read-after-write

Andrei Kvapil (kvaps) commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andrei Kvapil (kvaps) commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What changed

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading