Skip to content

feat(x402-metrics): add asset_symbol label for per-token queries#531

Closed
bussyjd wants to merge 1 commit into
fix/prometheus-rule-increase-7dfrom
feat/x402-asset-symbol-label
Closed

feat(x402-metrics): add asset_symbol label for per-token queries#531
bussyjd wants to merge 1 commit into
fix/prometheus-rule-increase-7dfrom
feat/x402-asset-symbol-label

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 24, 2026

Summary

Adds asset_symbol to the 6 verifier metric vecs so per-token revenue is queryable in PromQL directly, without the frontend having to join metrics with the ServiceOffer CR's spec.payment.asset.symbol at query time.

Stack note: this is technically branched off fix/prom-retention-window-alignment (PR A) because that PR introduces pruneSeriesNotIn, the lastPaymentSuccess gauge, and x402-prometheus-rules.yaml — all of which this PR extends. Listed merge target is main per the stacked-PR convention; once PR A lands, this should rebase cleanly. If reviewing before PR A merges, set the GitHub base to fix/prom-retention-window-alignment to see only the asset-symbol diff.

Cardinality math (no series multiplication)

Each ServiceOffer pins exactly one asset via spec.payment.asset.symbol. That means within the existing (offer_namespace, offer_name) group, asset_symbol is functionally constant — adding it does not multiply series count.

Per-vec series count before: 6 vecs * N offers * C chains-per-offer
Per-vec series count after: 6 vecs * N offers * C chains-per-offer * 1 asset-per-offer

The don't label what you can derive heuristic exists to prevent multiplicative blowups (chain x pod x pod_owner style). The single-asset-per-offer invariant means there's no multiplication to prevent — the same internal-consistency argument that already justifies chain (also CR-derived) on these vecs applies to asset_symbol.

Current test cluster (N=3 offers): demo-hello, silvermesh-paid, silvermesh-obol -> +3 series per vec, max +18 series total cluster-wide. Negligible.

Before / after PromQL

Before (frontend joins charged-requests metric with CR spec.payment.asset.symbol in JS):

sum by (offer_namespace, offer_name, chain) (
  increase(obol_x402_verifier_charged_requests_total[7d])
)
# then JS-side: foreach series, look up asset_symbol via /api/sell/list -> CR

After (direct):

sum by (asset_symbol) (x402:revenue:7d_by_offer_chain_asset_symbol)
# -> { asset_symbol="USDC" } 4231
# -> { asset_symbol="OBOL" } 88

Recording-rule migration plan

Non-breaking. Existing rules stay in place for one release:

  • x402:revenue:24h_by_offer_chain -> kept (frontend currently reads this)
  • x402:revenue:24h_by_offer_chain_asset_symbol -> new sibling
  • x402:revenue:7d_by_offer_chain -> kept
  • x402:revenue:7d_by_offer_chain_asset_symbol -> new sibling

Frontend can migrate to the _asset_symbol rules in a follow-up PR; the old rules continue to produce the same series (adding a label to the raw counter is non-breaking for queries that aggregate it away).

x402:settlement_rate:1h_by_offer_chain is intentionally not duplicated — it's a ratio; the asset wouldn't change the answer meaningfully.

Changes

  • internal/x402/metrics.go -> 6 vecs gain asset_symbol; pruneSeriesNotIn key now 4-tuple
  • internal/x402/verifier.go -> load() live-set uses 4-tuple; prometheusLabels() emits rule.AssetSymbol with "unknown" defensive fallback
  • internal/embed/infrastructure/base/templates/x402-prometheus-rules.yaml -> two new _asset_symbol-suffixed recording rules
  • internal/x402/verifier_test.go -> existing label-map assertions updated; 3 new tests added

Test plan

  • go build ./... clean
  • go vet ./internal/x402/... ./internal/embed/... clean
  • go test ./internal/x402/... green (all 6 vecs + new tests pass)
  • go test ./internal/embed/... green (CRD/manifest tests no regression)
  • Existing TestVerifier_Reload_PrunesDeletedOfferSeries from PR A still passes — asset-symbol-aware key doesn't break basic GC
  • TestVerifier_PrometheusLabels_IncludesAssetSymbol -> asserts USDC is emitted
  • TestVerifier_PrometheusLabels_DefaultsToUnknownIfEmpty -> asserts "unknown" fallback when AssetSymbol is empty
  • TestVerifier_PruneSeriesNotIn_DistinguishesAssetSymbol -> dropping USDC route while keeping OBOL route on the same offer leaves OBOL series alive (catches asset-repin leaks)
  • Live-cluster verification: scrape obol_x402_verifier_charged_requests_total{asset_symbol="OBOL"} after a paid OBOL request to silvermesh-obol returns 1
  • Recording rule rendered through helmfile: kubectl get prometheusrule -n x402 x402-verifier -o yaml shows both _by_offer_chain and _by_offer_chain_asset_symbol rules

@bussyjd bussyjd changed the base branch from main to fix/prometheus-rule-increase-7d May 24, 2026 06:48
Currently the verifier emits (offer_namespace, offer_name, chain).
Answering "what's my OBOL revenue?" requires joining metrics with
the ServiceOffer CR's spec.payment.asset.symbol at the frontend.
With asset_symbol on the label set, the answer is a direct PromQL
aggregation.

Cardinality cost: zero. Each offer pins exactly one asset (A=1
per offer), so the new dimension is functionally constant within
the existing (ns, name) group — no series multiplication. The
"don't label what you can derive" guidance exists to prevent
*multiplicative* blowups (chain x pod x pod_owner style); the
single-asset-per-offer invariant means there's no multiplication
to prevent.

The argument for adding asset_symbol is identical to the argument
that already justifies `chain` on these vecs: both are
CR-derived, both are query-meaningful, both have bounded values.

Changes:
  - 6 metric vecs: label slice gains "asset_symbol"
  - pruneSeriesNotIn key now (ns, name, chain, asset_symbol) so
    asset-repin doesn't leak the old series
  - verifier.load() live-set built with the same 4-tuple
  - prometheusLabels() emits rule.AssetSymbol (or "unknown" if
    empty as defensive fallback)
  - New _asset_symbol-suffixed recording rules added side-by-side
    with existing rules; existing rules unchanged (non-breaking)
  - Tests: emission asserts asset_symbol; prune test asserts
    asset-repin doesn't leak

Frontend can simplify the existing metric x CR join in a future PR
once it migrates to the _asset_symbol-suffixed rule.

Findings from: plans/integration-test-L7-paid-flow-20260524.md
(OBOL parity smoke surfaced this as a real gap when validating
the WalletStrip / EarningsStrip per-token columns).
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 24, 2026

Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved.

@bussyjd bussyjd closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant