Skip to content

feat(monetize): replace pause annotation with ERC-8004-friendly drain#535

Closed
bussyjd wants to merge 2 commits into
mainfrom
feat/drain-replaces-pause
Closed

feat(monetize): replace pause annotation with ERC-8004-friendly drain#535
bussyjd wants to merge 2 commits into
mainfrom
feat/drain-replaces-pause

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 24, 2026

Problem

Two things were broken about the legacy "pause" path:

  1. Pause was a route on/off switch masquerading as "pause your business."
    obol.org/paused: "true" made the controller delete the HTTPRoute
    immediately. From a remote x402 buyer's perspective that's
    indistinguishable from a crash — and ERC-8004 reputation scorers
    that watch a seller's /.well-known/agent-registration.json see an
    abrupt disappearance with no advertised wind-down.
  2. obol sell stop was broken. It patched status.conditions[Ready]=False,
    which the controller overwrote on the next reconcile. The CLI looked
    like it worked; in practice the offer stayed live.

Design

Replace pause with a real drain:

  • New spec fields advertise the wind-down via discovery before the
    route disappears, so external observers can react gracefully.
  • The HTTPRoute + payment gate stay up during the grace window so
    buyers can complete in-flight payments.
  • After the grace period expires, the controller tears down the route
    and marks Draining=False reason=Drained. The CR stays — obol sell delete is still the canonical removal command.

Pure-additive wire shape

Drain is purely additive in the catalog. Active offers serialize
identically to pre-drain releases: no new fields, no shape change. The
only new wire surface is drainEndsAt, which is set on draining offers
only. Consumers detect drain with:

if (entry.drainEndsAt) { /* draining; migrate before this time */ }

There is no available field. Presence of drainEndsAt is the signal.
This was an explicit design review outcome (commit dd89750): a
separate boolean was redundant and would have been a schema-breaking
change for strict consumers. Now there is zero schema breakage.

API

ServiceOfferSpec:

Field Type Default Behavior
drainAt *metav1.Time nil When set, offer is draining.
drainGracePeriod *metav1.Duration 1h How long after drainAt the route stays up. 0s tears down on the next reconcile.

ServiceOffer helpers: IsDraining(), DrainEndsAt() time.Time,
DrainExpired(now time.Time) bool.

CLI:

obol sell stop <name> -n <ns>                # default: drainAt=now, 1h grace
obol sell stop <name> -n <ns> --grace 30m    # custom grace
obol sell stop <name> -n <ns> --force        # alias: --now; zero grace, abrupt teardown

Discovery surfaces:

  • /api/services.json: draining entries gain a single drainEndsAt: <RFC3339>
    key. Active entries serialize unchanged.
  • /skill.md: per-service detail block adds a - **Drain ends at**:
    bullet only for draining offers. The table gains a Status column
    (active: , draining: draining · ends <RFC3339>).

Drain lifecycle

sequenceDiagram
    autonumber
    participant Op as Operator
    participant CR as ServiceOffer CR
    participant Ctl as serviceoffer-controller
    participant Disc as /skill.md +<br/>/.well-known/agent-registration.json
    participant Route as HTTPRoute + x402 Middleware
    participant Buyer as Remote buyer

    Op->>CR: obol sell stop my-svc<br/>(patch spec.drainAt=now,<br/>drainGracePeriod=1h)
    CR-->>Ctl: Update event
    Ctl->>Ctl: IsDraining=true,<br/>DrainExpired=false
    Ctl->>Disc: emit drainEndsAt=T+1h<br/>(no `available` field)
    Ctl->>Route: KEEP UP
    Ctl->>CR: Draining=True reason=Draining
    Ctl->>Ctl: AddAfter(T+1h)
    Buyer->>Disc: poll catalog
    Disc-->>Buyer: drainEndsAt set → migrate
    Buyer->>Route: in-flight paid request
    Route-->>Buyer: 200 OK
    Note over Ctl: ...grace period elapses...
    Ctl->>Ctl: DrainExpired=true
    Ctl->>Route: deleteRouteChildren()
    Ctl->>CR: Draining=False reason=Drained,<br/>PaymentGateReady=False,<br/>RoutePublished=False
    Op->>CR: obol sell delete (later, canonical removal)
Loading

Why ERC-8004 reputation matters

ERC-8004 makes seller reputation an on-chain signal that buyers and
discovery agents can score. An abrupt route teardown looks identical to
a process crash or upstream outage — a negative reputation event.
Advertising a planned wind-down (drainEndsAt) lets buyers and scorers
distinguish "this seller is gracefully retiring this offer" from
"this seller's infrastructure is unreliable." Even short grace windows
(a few minutes) move the signal from "outage" to "planned maintenance."

Migration

If you were setting obol.org/paused: "true" directly, the annotation
no longer has any effect. To match the old abrupt-teardown semantics:

obol sell stop <name> -n <ns> --force

For the recommended graceful behavior, drop --force and let buyers
see the wind-down via discovery.

Test plan

  • go build ./...
  • go test ./internal/monetizeapi/... ./internal/serviceoffercontroller/... ./internal/x402/... ./cmd/obol/... ./internal/schemas/...
  • Unit: ServiceOffer.IsDraining, DrainEndsAt, DrainExpired (nil, mid-drain, expired, --force zero-grace)
  • Render: pre-drain (no drainEndsAt, no available), mid-drain (only drainEndsAt), drain-expired (filtered from catalog)
  • Render: per-service /skill.md detail block carries no Available bullet on active offers; only draining offers get a Drain-ends-at bullet
  • x402 verifier source: drain-expired offer skipped from RouteRules; mid-drain offer kept
  • CLI: obol sell stop has --grace (default 1h) and --force (alias --now)
  • Pure-additivity invariant on raw JSON: active entries have NO available or drainEndsAt keys
  • Manual: obol sell stop my-svc -n llm → confirm /api/services.json entry gains a drainEndsAt, /skill.md shows the drain banner, paid requests still 200 OK
  • Manual: wait grace, confirm kubectl get httproute -n llm no longer shows the offer's route
  • Manual: obol sell stop my-svc -n llm --force → confirm route disappears on the next reconcile

bussyjd added 2 commits May 24, 2026 12:49
The legacy obol.org/paused annotation tore down HTTPRoutes immediately,
which is indistinguishable from a crash to remote x402 buyers and ERC-8004
reputation scorers. obol sell stop was also broken: it patched
status.conditions which the controller immediately overwrote.

This replaces both with a real drain:

- New ServiceOffer spec.drainAt (date-time) + spec.drainGracePeriod
  (duration; default 1h) mark an offer as winding down.
- While draining, /skill.md and /.well-known/agent-registration.json
  advertise the offer with available=false and drainEndsAt set, so
  external discovery can react before traffic disappears.
- The HTTPRoute + payment gate stay up until DrainEndsAt, letting
  in-flight buyers complete payments.
- After the grace period, the controller tears down the route, sets
  Draining=False reason=Drained, and leaves the CR (delete is the
  canonical removal command).

obol sell stop sets spec.drainAt, supports --grace <duration> and
--force/--now (zero grace = abrupt teardown for behavior parity with
the old annotation).
…e drain signal

Design review concluded the `available` boolean was redundant — the
presence of `drainEndsAt` is sufficient to signal drain state. This
makes the drain wire shape purely additive: active offers serialize
identically to pre-drain releases.

Wire changes:
- ServiceCatalogEntry.Available field removed.
- DrainEndsAt is the only drain signal. Consumers detect drain with
  `if (entry.drainEndsAt) { /* draining */ }`.
- /skill.md detail block: no Available bullet on active offers; only
  draining offers get a "Drain ends at" bullet.
- /skill.md table column renamed Available → Status; active rows show
  "—", draining rows show "draining · ends <RFC3339>".

JSON Schema: `available` removed from required and from properties;
`drainEndsAt` description updated to "Presence = draining."

Tests updated to assert active entries carry NO `available` or
`drainEndsAt` keys in the raw JSON, and the markdown detail block for
active offers contains no Available line.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 24, 2026

Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant