fix(x402): gate verifier /readyz on informer cache sync#519
Closed
bussyjd wants to merge 1 commit into
Closed
Conversation
Closes the root cause of CLAUDE.md pitfall #14 ("first-request flake on freshly-deployed verifier"). Previously /readyz returned 200 the moment config.Load() became non-nil, but routes from the ServiceOffer informer load later — between those two events the pod is Ready from kubelet's view, receives Service traffic, and matchPaidRoute returns "no rule -> 200" for paid routes. The release-smoke flows hide this behind 12x5s retry loops; the actual fix is to not be Ready until routes are loaded. - Adds routesLoaded atomic.Bool to Verifier. - HandleReadyz returns 503 until BOTH config and routes loaded, with a body that distinguishes the two cases for kubectl describe debuggability. - WatchServiceOffers takes an optional onFirstApply callback, invoked after the post-WaitForCacheSync refresh succeeds. - main.go wires v.MarkRoutesLoaded as the callback for kube source, or invokes it directly after NewVerifier for file source (the file source has no informer; routes are loaded synchronously). Pairs with PR #515 (replicas: 1) — at single replica the rollout window for this race shrinks from "some scrapes" to "first ~5-10s", but it's still a bug; this PR closes it.
This was referenced May 23, 2026
Collaborator
Author
|
Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Closes CLAUDE.md pitfall #14 root cause. Before this PR, the verifier
/readyzreturns 200 the moment static config loads — but routes from the ServiceOffer informer load later. During the gap, kubelet adds the pod to Service Endpoints, Traefik forwards paid-route requests, andmatchPaidRoutereturns "no rule -> 200" (free pass on paid traffic).Before
After
What changed
Verifier.routesLoaded atomic.Bool+MarkRoutesLoaded()methodHandleReadyzreturns 503 with cause-specific body until both config and routes loadedWatchServiceOffersgains optionalonFirstApplycallbackmain.gowires it for kube source; calls directly for file sourceTest plan
go build ./...cleango test ./internal/x402/... ./cmd/x402-verifier/...greenkubectl describe podshould show readiness probe failures briefly (~5s on warm cluster) then succeedStacks on
PR #513 + PR #515. Rebase onto main after both merge.
Pairs with
PR #515 (replicas: 1) shrinks the race window but doesn't close it. This PR closes it.