fix(x402-metrics): align Prometheus retention with recording-rule windows; rename mis-named lifetime rule#516
Closed
bussyjd wants to merge 1 commit into
Closed
Conversation
…dows; rename mis-named lifetime rule Two related metric-correctness fixes layered on top of the recording rules added in 27e1ac5: 1. Retention 6h → 8d. The recording rules added in 27e1ac5 use [24h] and [7d] windows. `increase(x[24h])` against a 6h-retention TSDB silently returns "last 6h extrapolated to 24h" with no error. The frontend displays that result as "24h revenue" — wrong by 4x. 8d (= 7d + 1d safety margin) keeps the [7d] rule valid across a brief Prometheus outage. 2. `x402:revenue:lifetime_by_offer` → `x402:revenue:total_by_offer_current`. The original expression was `sum(counter)` (not `sum(increase[lifetime])`), so it: * is NOT lifetime — it's "sum across currently-alive verifier replicas of their since-last-restart counts", * drops ~50% on every replica rollout, * compounds with the per-pod-registry issue addressed by the replicas:1 fix. Renaming makes the semantic explicit. True lifetime queries should use `sum_over_time(...[Nd])` against a long-retention store. Retention bump increases Prometheus disk footprint roughly proportional to (8d/6h) ≈ 32x. The local-only kube-prometheus-stack PVC sizing in monitoring.yaml.gotmpl needs review on next `obol stack up` if disk pressure shows up — currently no PVC size cap set, so it inherits the storageClass default.
This was referenced May 24, 2026
Collaborator
Author
|
Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
27e1ac5(this PR's parent) added recording rules over[24h]and[7d]. The Prometheus retention was 6h.increase(x[24h])over a 6h TSDB silently returns 6h extrapolated as 24h — the frontend reads it and shows the wrong number with no warning.Same commit's
x402:revenue:lifetime_by_offerrule issum(counter), notsum(increase[lifetime])— it resets on every replica restart, so "lifetime" is a misnomer.Before
After
What changed
monitoring.yaml.gotmpl:retention: 6h -> 8d(= 7d max window + 1d safety)x402-prometheus-rules.yaml: rename rule + add explanatory commentDisk impact
~32x increase in Prometheus disk footprint (proportional to retention). Local-only kube-prometheus-stack PVC currently has no size cap (inherits storageClass default). Watch for PVC-full on next
obol stack upand add a cap if it bites.Frontend consumer
obol-stack-front-end/src/lib/services/prometheus.ts(lines 49, 61) referencesx402:revenue:lifetime_by_offerin a doc comment + OR-fallback expression for the lifetime-earnings query. Architecture-review notes flagged the OR fallback added in #330 was unused on the consumer side anyway, so this rename has zero immediate UI impact. A follow-up small PR on the frontend repo should update the name + comment tox402:revenue:total_by_offer_current. Not modified in this PR (separate repo).Test plan
go build ./...cleango test ./internal/embed/... ./internal/x402/...greenx402:revenue:7d_by_offer_chainshould return non-extrapolated valuesStacks on
PR #513. Rebase onto main after #513 merges.