Skip to content

containerboot: retrigger cert loops on netmap updates#9

Draft
ryantm wants to merge 1 commit intozerg/operator-wildcard-certfrom
fix/orb-ha-cert-loop
Draft

containerboot: retrigger cert loops on netmap updates#9
ryantm wants to merge 1 commit intozerg/operator-wildcard-certfrom
fix/orb-ha-cert-loop

Conversation

@ryantm
Copy link
Copy Markdown

@ryantm ryantm commented Apr 1, 2026

Why

The west1 Orb HA ingress could create its Tailscale Service and serve config, but the shared TLS Secret for mcp-orb.tail0a469.ts.net stayed empty. Because the HA operator only advertises HTTPS services after certs exist, Orb never got AdvertiseServices and never elected a primary VIP advertiser.

The issue is in containerboot's cert-loop startup path for HA ingress proxies. It only retriggers the cert manager when the serve config changes or when the first cert domain changes. That misses the case where a new service cert domain is added later in the netmap while the serve config is unchanged.

What changed

  • track the full NetMap.DNS.CertDomains set in cmd/containerboot/main.go
  • retrigger the serve watcher when that set changes, even if the first cert domain string is unchanged
  • keep rerunning EnsureCertLoops on watcher wakeups in cmd/containerboot/serve.go, even when the serve config itself is unchanged

Test plan

  • gofmt -w cmd/containerboot/main.go cmd/containerboot/serve.go
  • go test ./cmd/containerboot ./kube/certs ./ipn/store/kubestore

Notes

This was validated against the live west1 Orb ingress:

  • before the fix, the shared TLS Secret mcp-orb.tail0a469.ts.net had empty tls.crt/tls.key
  • after forcing a cert and reconciling, the operator immediately added AdvertiseServices:["svc:mcp-orb"] and the VIP became reachable

This change makes that cert-loop wakeup happen automatically for future HA ingress services with the same timing pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant