From 9be9de8dcf4eb01bd3b8c157082bbc3d57da290b Mon Sep 17 00:00:00 2001 From: bussyjd Date: Sat, 23 May 2026 22:32:13 +0400 Subject: [PATCH] =?UTF-8?q?fix(x402):=20verifier=20replicas:=202=20?= =?UTF-8?q?=E2=86=92=201=20to=20keep=20metric=20GC=20correct?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 0fbb99a (fix(x402): GC verifier metric series for deleted offers) added pruneSeriesNotIn to Verifier.load. Each verifier pod runs its own informer + its own metric registry, so the GC is per-pod. With replicas: 2 + ServiceMonitor (round-robin scrape over Endpoints), Prometheus sees: * one pod's registry on scrape N (pruned correctly), * the other pod's on scrape N+1 (may still hold a deleted offer's series until that pod's informer also sees the delete). Result: deleted offers' last_payment_success_seconds gauge and charged_requests_total counters reappear every other scrape, polluting dashboards and creating spurious alert state. Cheapest correct fix is replicas: 1. The verifier is on the request path but single-node k3d gains no HA from 2 replicas. Drop the PodDisruptionBudget too — minAvailable:1 at replicas:1 just blocks voluntary drains on the only pod, useless on k3d. If/when the stack ever runs multi-node and HA replicas are wanted, the right pattern is ServiceMonitor → PodMonitor with a `pod` label and recording rules using `sum without(pod)`. That's a future change; right now correctness > theoretical HA. --- .../infrastructure/base/templates/x402.yaml | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/internal/embed/infrastructure/base/templates/x402.yaml b/internal/embed/infrastructure/base/templates/x402.yaml index 3848238..25b97ed 100644 --- a/internal/embed/infrastructure/base/templates/x402.yaml +++ b/internal/embed/infrastructure/base/templates/x402.yaml @@ -200,7 +200,11 @@ metadata: labels: app: x402-verifier spec: - replicas: 2 + # Single replica — verifier holds per-pod metric registries and per-pod + # informer caches; multiple replicas produce metric series drift across + # ServiceMonitor scrape rotations and the pruneSeriesNotIn GC (metrics.go) + # becomes inconsistent. Single-node k3d gains no HA from 2 replicas. + replicas: 1 selector: matchLabels: app: x402-verifier @@ -321,18 +325,6 @@ spec: targetPort: http protocol: TCP ---- -apiVersion: policy/v1 -kind: PodDisruptionBudget -metadata: - name: x402-verifier - namespace: x402 -spec: - minAvailable: 1 - selector: - matchLabels: - app: x402-verifier - --- # ServiceMonitor for x402-verifier — scrapes the stable Service endpoint # rather than per-pod IPs (which is what a PodMonitor would do). Lives