Skip to content

feat(controller): wire client-go leader-election so HA scaling is safe#518

Closed
bussyjd wants to merge 1 commit into
mainfrom
feat/controller-leader-election
Closed

feat(controller): wire client-go leader-election so HA scaling is safe#518
bussyjd wants to merge 1 commit into
mainfrom
feat/controller-leader-election

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 23, 2026

Why

Today the serviceoffer-controller is replicas: 1 with a "Do not scale" comment. RBAC for coordination.k8s.io/leases is already granted but unused. An accidental kubectl scale --replicas=2 produces split-brain finalizers and double on-chain ERC-8004 registration (real gas spend + duplicate registry entries).

This wires leader-election so multi-replica is safe-by-correctness, not safe-by-comment.

Before

   Operator: kubectl scale deploy/serviceoffer-controller --replicas=2
                            │
                            ▼
       ┌───────────────────┴───────────────────┐
       ▼                                       ▼
   controller-pod-A                       controller-pod-B
   reconciles offer X                     reconciles offer X
       │                                       │
       ▼                                       ▼
   creates HTTPRoute, ReferenceGrant,     same — race on Update
   RegistrationRequest                    finalizer set both, removed
   submits ERC-8004 tx (gas spent)        submits ERC-8004 tx (gas spent)
       │                                       │
       └─────────────┬─────────────────────────┘
                     ▼
       2 on-chain registrations for same offer
       2 stale HTTPRoute generations
       Finalizer thrash

After

   Operator: kubectl scale deploy/serviceoffer-controller --replicas=2
                            │
                            ▼
       ┌───────────────────┴───────────────────┐
       ▼                                       ▼
   controller-pod-A                       controller-pod-B
   acquires Lease "serviceoffer-controller" in x402 ns
       │                                       │
       ▼                                       ▼
   OnStartedLeading() → runs reconciler     OnNewLeader(A) → standby
                                              (renews lease watch only)
   ...                                          ...
   pod-A dies                              acquires Lease within ~30s
                                           OnStartedLeading() → runs

What changed

  • cmd/serviceoffer-controller/main.go — wraps controller.Run in leaderelection.RunOrDie. POD_NAME/POD_NAMESPACE from downward API. --leader-elect=false flag for local dev.
  • x402.yaml — adds downward-API POD_NAME env to controller Deployment (POD_NAMESPACE was already wired); updates "Do not scale" comment to reflect that scaling is now safe.

Lease parameters

  • LeaseDuration 30s, RenewDeadline 20s, RetryPeriod 5s — fast failover on k3d single-node. Tunable.
  • ReleaseOnCancel: true — graceful shutdown releases the lease immediately, no wait for expiry.

Test plan

  • go build ./... clean
  • go test ./internal/serviceoffercontroller/... ./cmd/serviceoffer-controller/... green
  • go test ./internal/embed/... green (embedded manifest still parses)
  • Manual: kubectl scale deploy/serviceoffer-controller -n x402 --replicas=2 — confirm pod-B logs "new leader is pod-A"
  • Manual: kubectl delete pod -n x402 -l app=serviceoffer-controller --field-selector=metadata.name=pod-A — pod-B should take leadership within ~30s

Pairs with

PR #515 (verifier replicas: 1). The verifier needs per-pod metric correctness so replicas: 1 stays; the controller's correctness requirement was different (write-side races), now solved by leader-election.

Today the serviceoffer-controller is pinned at replicas: 1 with a
"Do not scale" comment in x402.yaml. The RBAC for leases is already
granted (x402.yaml:176-178) — pre-positioned and unused. An accidental
`kubectl scale --replicas=2` or HPA misconfiguration produces
split-brain finalizers and double on-chain ERC-8004 registration
(real gas spend + duplicate registry entries).

This wires client-go tools/leaderelection so multi-replica deployment
is safe-by-correctness, not safe-by-comment.

  - cmd/serviceoffer-controller/main.go:
      - Read POD_NAME / POD_NAMESPACE from downward API env.
      - Acquire Lease "serviceoffer-controller" in POD_NAMESPACE
        before running the reconcile loop.
      - On lost leadership, os.Exit(1) — kubelet restarts the pod
        which re-elects from scratch.
      - --leader-elect flag (default true) so local dev can bypass.

  - x402.yaml:
      - Add downward-API POD_NAME env to the controller Deployment
        (POD_NAMESPACE was already wired).
      - Update the "Do not scale" comment to "Single replica by
        default; bumping to 2+ is now safe — leader election prevents
        split-brain on the reconcile loop."

  - Lease parameters chosen for fast failover on k3d (lease=30s,
    renew=20s, retry=5s). Tunable via flag if a multi-zone deployment
    ever needs longer.

Uses client-go directly rather than controller-runtime Manager to
minimize churn — controller is currently raw client-go workqueues,
not controller-runtime. Migration to controller-runtime is a separate
much larger workstream and not necessary just for leader election.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 24, 2026

Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant