From 522aeaeabe83b306e589d89bca8395a2200524ef Mon Sep 17 00:00:00 2001 From: bussyjd Date: Sat, 23 May 2026 22:32:56 +0400 Subject: [PATCH] fix(x402-metrics): align Prometheus retention with recording-rule windows; rename mis-named lifetime rule MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related metric-correctness fixes layered on top of the recording rules added in 27e1ac5: 1. Retention 6h → 8d. The recording rules added in 27e1ac5 use [24h] and [7d] windows. `increase(x[24h])` against a 6h-retention TSDB silently returns "last 6h extrapolated to 24h" with no error. The frontend displays that result as "24h revenue" — wrong by 4x. 8d (= 7d + 1d safety margin) keeps the [7d] rule valid across a brief Prometheus outage. 2. `x402:revenue:lifetime_by_offer` → `x402:revenue:total_by_offer_current`. The original expression was `sum(counter)` (not `sum(increase[lifetime])`), so it: * is NOT lifetime — it's "sum across currently-alive verifier replicas of their since-last-restart counts", * drops ~50% on every replica rollout, * compounds with the per-pod-registry issue addressed by the replicas:1 fix. Renaming makes the semantic explicit. True lifetime queries should use `sum_over_time(...[Nd])` against a long-retention store. Retention bump increases Prometheus disk footprint roughly proportional to (8d/6h) ≈ 32x. The local-only kube-prometheus-stack PVC sizing in monitoring.yaml.gotmpl needs review on next `obol stack up` if disk pressure shows up — currently no PVC size cap set, so it inherits the storageClass default. --- .../base/templates/x402-prometheus-rules.yaml | 9 +++++---- .../embed/infrastructure/values/monitoring.yaml.gotmpl | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml b/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml index 73b10f94..d0dbe754 100644 --- a/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml +++ b/internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml @@ -47,10 +47,11 @@ spec: increase(obol_x402_verifier_charged_requests_total[7d]) ) - # Lifetime charged-request count per offer (sum across replicas - # + chains). Used in the My Listings "today · X earned" header - # text and the Browse catalog usage badge. - - record: x402:revenue:lifetime_by_offer + # Sum of currently-running verifier replicas' counters — resets + # on rollout; for true lifetime, query against a long-retention + # store or use `sum_over_time(...[Nd])`. Used in the My Listings + # "today · X earned" header text and the Browse catalog usage badge. + - record: x402:revenue:total_by_offer_current expr: | sum by (offer_namespace, offer_name) ( obol_x402_verifier_charged_requests_total diff --git a/internal/embed/infrastructure/values/monitoring.yaml.gotmpl b/internal/embed/infrastructure/values/monitoring.yaml.gotmpl index 18e6ba01..e440bd0d 100644 --- a/internal/embed/infrastructure/values/monitoring.yaml.gotmpl +++ b/internal/embed/infrastructure/values/monitoring.yaml.gotmpl @@ -11,7 +11,7 @@ prometheus: matchLabels: release: monitoring podMonitorNamespaceSelector: {} - retention: 6h + retention: 8d resources: requests: cpu: 100m