From 9be9de8dcf4eb01bd3b8c157082bbc3d57da290b Mon Sep 17 00:00:00 2001
From: bussyjd <bussyjd@users.noreply.github.com>
Date: Sat, 23 May 2026 22:32:13 +0400
Subject: [PATCH] =?UTF-8?q?fix(x402):=20verifier=20replicas:=202=20?=
 =?UTF-8?q?=E2=86=92=201=20to=20keep=20metric=20GC=20correct?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 0fbb99a (fix(x402): GC verifier metric series for deleted offers)
added pruneSeriesNotIn to Verifier.load. Each verifier pod runs its
own informer + its own metric registry, so the GC is per-pod. With
replicas: 2 + ServiceMonitor (round-robin scrape over Endpoints),
Prometheus sees:

  * one pod's registry on scrape N (pruned correctly),
  * the other pod's on scrape N+1 (may still hold a deleted offer's
    series until that pod's informer also sees the delete).

Result: deleted offers' last_payment_success_seconds gauge and
charged_requests_total counters reappear every other scrape, polluting
dashboards and creating spurious alert state.

Cheapest correct fix is replicas: 1. The verifier is on the request
path but single-node k3d gains no HA from 2 replicas. Drop the
PodDisruptionBudget too — minAvailable:1 at replicas:1 just blocks
voluntary drains on the only pod, useless on k3d.

If/when the stack ever runs multi-node and HA replicas are wanted,
the right pattern is ServiceMonitor → PodMonitor with a `pod` label
and recording rules using `sum without(pod)`. That's a future change;
right now correctness > theoretical HA.
---
 .../infrastructure/base/templates/x402.yaml    | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/internal/embed/infrastructure/base/templates/x402.yaml b/internal/embed/infrastructure/base/templates/x402.yaml
index 3848238..25b97ed 100644
--- a/internal/embed/infrastructure/base/templates/x402.yaml
+++ b/internal/embed/infrastructure/base/templates/x402.yaml
@@ -200,7 +200,11 @@ metadata:
   labels:
     app: x402-verifier
 spec:
-  replicas: 2
+  # Single replica — verifier holds per-pod metric registries and per-pod
+  # informer caches; multiple replicas produce metric series drift across
+  # ServiceMonitor scrape rotations and the pruneSeriesNotIn GC (metrics.go)
+  # becomes inconsistent. Single-node k3d gains no HA from 2 replicas.
+  replicas: 1
   selector:
     matchLabels:
       app: x402-verifier
@@ -321,18 +325,6 @@ spec:
       targetPort: http
       protocol: TCP
 
----
-apiVersion: policy/v1
-kind: PodDisruptionBudget
-metadata:
-  name: x402-verifier
-  namespace: x402
-spec:
-  minAvailable: 1
-  selector:
-    matchLabels:
-      app: x402-verifier
-
 ---
 # ServiceMonitor for x402-verifier — scrapes the stable Service endpoint
 # rather than per-pod IPs (which is what a PodMonitor would do). Lives