Skip to content

feat(api): spec.paused + metav1.Condition with listType=map (CRD v1alpha2-prep)#526

Closed
bussyjd wants to merge 1 commit into
feat/controller-gen-codegenfrom
feat/crd-paused-spec-and-metav1-condition
Closed

feat(api): spec.paused + metav1.Condition with listType=map (CRD v1alpha2-prep)#526
bussyjd wants to merge 1 commit into
feat/controller-gen-codegenfrom
feat/crd-paused-spec-and-metav1-condition

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 23, 2026

Why

The architecture review surfaced two CRD-design problems that bundle naturally because both rewrite types.go and every *-crd.yaml:

1. Pause-as-annotation violates K8s convention

api-conventions: annotations are "non-identifying metadata", NOT control input. Upstream uses typed spec fields: Deployment.spec.paused, CronJob.spec.suspend. The annotation approach loses:

  • OpenAPI validation (anyone can write obol.org/paused: yes and silently get unpaused)
  • RBAC scoping (annotation is part of metadata, gated by patch verb on parent)
  • SSA conflict detection (annotations are a single map field)
  • Status observability (no Paused condition; observers parse other-condition reasons)

2. Custom Condition type misses ObservedGeneration + SSA-safe merging

api-conventions: every condition MUST carry observedGeneration so clients can tell stale-view from current-view. CRD YAML needs x-kubernetes-list-type: map + x-kubernetes-list-map-keys: [type] for SSA to dedupe-by-type instead of replacing the whole array.

Before

   pause/resume:
     metadata.annotations["obol.org/paused"] = "true"    <- annotation
     status synthesized: PaymentGateReady=False, reason=Paused  <- side effect

   conditions:
     custom Condition{Type, Status, Reason, Message, LastTransitionTime}
     CRD yaml: type: array  (no listType)
                <- SSA replaces the whole array; second reconciler stomps
     no observedGeneration field
                <- clients can't tell if condition reflects current spec

   Ready:
     setCondition(\"Ready\", ...) called from several reconcile paths
                <- drift between Ready and the predicate conditions

After

   pause/resume:
     spec.paused: bool                                   <- typed, validated
     status.conditions[Paused]: {Status, Reason: PausedBySpec | PausedByAnnotation, ObservedGeneration}
     legacy annotation still read for one release        <- deprecation log on use

   conditions:
     metav1.Condition (upstream type, has ObservedGeneration)
     CRD yaml: x-kubernetes-list-type: map               <- SSA-safe merging
               x-kubernetes-list-map-keys: [\"type\"]
     +patchStrategy=merge +patchMergeKey=type

   Ready:
     pure rollup: ModelReady AND UpstreamHealthy AND PaymentGateReady
                   AND RoutePublished AND Registered
     computed once at end of Reconcile
     no independent Ready setCondition calls

What changed

  • internal/monetizeapi/types.go:
    • []Condition -> []metav1.Condition in 3 status structs
    • +listType=map +listMapKey=type +patchStrategy=merge +patchMergeKey=type markers
    • spec.paused: bool added to ServiceOfferSpec (with kubebuilder:default=false)
    • IsPaused() reads spec first, annotation as fallback (1-release deprecation)
    • IsPausedByAnnotation() exposed so controller can emit deprecation log + tag Paused condition reason
    • Paused printer column added (priority=1)
  • internal/serviceoffercontroller/controller.go:
    • Sets typed Paused condition each reconcile (PausedBySpec / PausedByAnnotation / NotPaused)
    • Once-per-offer deprecation log when reading paused state from annotation
    • Ready computed as final rollup via rollupReady() (no independent Ready setCondition paths)
  • internal/serviceoffercontroller/render.go + purchase.go + agent.go:
    • setCondition / setPurchaseCondition / setAgentCondition delegate to apimachinery/pkg/api/meta.SetStatusCondition (dedupe-by-type, lastTransitionTime, ObservedGeneration handled centrally)
    • isConditionTrue defers to apimeta.IsStatusConditionTrue
  • 5 *-crd.yaml regenerated via just generate (controller-gen v0.16.5 from PR feat(monetizeapi): controller-gen as canonical CRD schema source #525):
    • x-kubernetes-list-type: map + x-kubernetes-list-map-keys: [type] on every conditions array
    • paused: type: boolean default: false on spec
    • observedGeneration accepted on every condition entry
    • Printer column for Paused
  • New paused_condition_test.go covers: spec.paused path, annotation back-compat path, Paused condition presence + reason + ObservedGeneration, Ready rollup correctness (True only when all five predicates True), explicit-generation argument override
  • Existing embed_crd_test.go::TestServiceOfferCRD_PrinterColumns updated to expect the new Paused column

Frontend follow-up (out of scope)

obol-stack-front-end writes obol.org/paused directly via sell.ts:103,163. A separate follow-up PR will flip frontend writes to spec.paused. The 1-release annotation read keeps the FE working until that lands.

Test plan

  • just generate produces zero diff when re-run (idempotent)
  • go build ./... clean
  • go vet clean on changed packages
  • go test ./internal/monetizeapi/... ./internal/serviceoffercontroller/... ./internal/x402/... ./internal/embed/... ./cmd/obol/... green
  • Tests: spec.paused respected, annotation respected (backwards compat), Paused condition appears, Ready is the rollup, ObservedGeneration populated
  • Manual: kubectl patch serviceoffer foo -p '{\"spec\":{\"paused\":true}}' --type=merge -> controller observes, sets Paused condition, tears down route

Stacks on

PR #525 (controller-gen). Rebase onto main after PR #525 merges.

Final PR in roadmap

This is the 14th and final PR in the post-7-agent-architecture-review roadmap. The bundled items (11+12 in the original numbering) were merged into one PR per the review's "do not split" recommendation.

…D v1alpha2-prep)

Bundled CRD-design fixes from the architecture review. Both rewrite
types.go and every *-crd.yaml — must not split (guaranteed merge
conflicts otherwise).

  1. obol.org/paused annotation → spec.paused: bool + Paused condition
     api-conventions: annotations are non-identifying metadata, NOT
     control input. Compare Deployment.spec.paused, CronJob.spec.suspend.
     Adds spec.paused (typed, validated by OpenAPI), keeps the annotation
     read for one release (deprecation log on use), adds a Paused
     condition to status as the observed-state mirror.

  2. Custom Condition → metav1.Condition
     Adds ObservedGeneration (api-conventions requires it on every
     condition: it tells clients whether the condition refers to a
     recent spec or a stale view).
     Adds +listType=map + +listMapKey=type markers — without these,
     SSA on conditions arrays creates duplicate entries when two
     reconcilers touch different condition types. CRD YAMLs gain
     x-kubernetes-list-type: map.

  3. Ready is now a pure rollup
     Computed once at end of Reconcile from
     (ModelReady ∧ UpstreamHealthy ∧ PaymentGateReady ∧ RoutePublished
     ∧ Registered). Removes independent Ready setCondition calls that
     were drifting from the predicate conditions.

This is the LAST of 14 PRs in the post-architecture-review roadmap
(items 11+12 bundled per the review's "do not split" recommendation).

Backwards compat:
  - obol.org/paused annotation still honored (one release window).
  - PausedByAnnotation reason on the Paused condition makes the
    legacy code path visible to operators.
  - Will drop annotation reading in v0.11.0.

Frontend pause/resume (currently writes the annotation) needs a
follow-up PR on obol-stack-front-end to flip to spec.paused — out of
scope here.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 24, 2026

Obsoleted by #535 which removes pause entirely. Bundle PR #536 includes the drain replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant