You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Peer discovery + config writing for SeiNodes is split across two controllers today, and the split is not principled:
internal/controller/node/peers.go (controller-side) — reconcilePeers resolves Spec.Peers (EC2 tags, K8s labels, static) into a list of host:port entries written to Status.ResolvedPeers. Runs every reconcile.
CollectAndSetPeers (sidecar-side) — runs as a planned task. Reads Status.ResolvedPeers from its own SeiNode, queries each peer's :26657/status to discover the peer's node_id, then writes the full <nodeID>@<host>:<port> list into persistent_peers in config.toml and reloads seid.
The split is "controller discovers K8s/AWS membership; sidecar enriches with on-chain identity." That's a defensible separation in isolation, but in practice it means:
Two reconcile cadences for one logical concern — controller-side runs every reconcile (~30s steady-state); sidecar-side runs only when the planner schedules CollectAndSetPeers.
Two error surfaces — controller errors land on SND/SeiNode status; sidecar errors land on task status. An operator chasing "why isn't this node peering?" has to check both.
Membership drift between the two — Status.ResolvedPeers can be stale relative to what the sidecar actually wrote into persistent_peers.
The seinodeDeployment PeerSource extension we landed in PR-1 (feat(api): networking http/tcp sub-structs + seinodeDeployment peer variant (#360 PR-1) #361, then dropped during PR-3 cuts in favor of using the existing LabelPeerSource with the sei.io/nodedeployment label) is a hint that the controller-side resolver is already doing more than just "K8s membership" — it's resolving cross-SND peer relationships that fold into the same path.
Impact
This shows up immediately when we go to roll out publishable P2P (#360) and start exercising the cross-cluster peer-discovery story. Validators-peering-validators across a deterministic set means the controller's resolver becomes the load-bearing membership service, and the split-with-sidecar-enrichment adds enough latency + opacity that it becomes a debugging hot spot during P2P bring-up.
Proposed approach (to refine)
Three shapes worth considering:
Controller owns everything — controller resolves peers AND queries node_id (proxy through sidecar HTTP if needed), writes the final persistent_peers directly. Sidecar's CollectAndSetPeers becomes config-apply only.
Sidecar owns everything — controller's reconcilePeers is deleted; sidecar handles selector→host+nodeID→config end-to-end. Requires the sidecar to read K8s resources (or have the controller push the selector spec via a different surface).
Cleaner split, same boundaries — keep the controller-resolves / sidecar-enriches split but unify the data surface (one canonical place to read "what peers does this node currently have?") and the error surface (one condition that aggregates both halves).
(1) is probably the cleanest if we can do it without making the controller depend on seid availability for membership — but that's a structural question worth weighing.
Out of scope
Anything that changes the Spec.Peers user surface. The PeerSource union (ec2Tags, static, label) stays.
Genesis ceremony peer logic (controller-side genesis assembly is a different code path).
Cross-cluster discovery transport (we currently rely on the EC2 tag resolver for sei-infra-managed peers; this stays).
Relevant experts
kubernetes-specialist — controller-runtime patterns, the existing reconcilePeers shape, how a controller-owned resolver would handle the sidecar boundary.
Problem
Peer discovery + config writing for SeiNodes is split across two controllers today, and the split is not principled:
internal/controller/node/peers.go(controller-side) —reconcilePeersresolvesSpec.Peers(EC2 tags, K8s labels, static) into a list ofhost:portentries written toStatus.ResolvedPeers. Runs every reconcile.CollectAndSetPeers(sidecar-side) — runs as a planned task. ReadsStatus.ResolvedPeersfrom its own SeiNode, queries each peer's:26657/statusto discover the peer'snode_id, then writes the full<nodeID>@<host>:<port>list intopersistent_peersinconfig.tomland reloads seid.The split is "controller discovers K8s/AWS membership; sidecar enriches with on-chain identity." That's a defensible separation in isolation, but in practice it means:
CollectAndSetPeers.Status.ResolvedPeerscan be stale relative to what the sidecar actually wrote intopersistent_peers.seinodeDeploymentPeerSource extension we landed in PR-1 (feat(api): networking http/tcp sub-structs + seinodeDeployment peer variant (#360 PR-1) #361, then dropped during PR-3 cuts in favor of using the existingLabelPeerSourcewith thesei.io/nodedeploymentlabel) is a hint that the controller-side resolver is already doing more than just "K8s membership" — it's resolving cross-SND peer relationships that fold into the same path.Impact
This shows up immediately when we go to roll out publishable P2P (#360) and start exercising the cross-cluster peer-discovery story. Validators-peering-validators across a deterministic set means the controller's resolver becomes the load-bearing membership service, and the split-with-sidecar-enrichment adds enough latency + opacity that it becomes a debugging hot spot during P2P bring-up.
Proposed approach (to refine)
Three shapes worth considering:
node_id(proxy through sidecar HTTP if needed), writes the finalpersistent_peersdirectly. Sidecar'sCollectAndSetPeersbecomes config-apply only.reconcilePeersis deleted; sidecar handles selector→host+nodeID→config end-to-end. Requires the sidecar to read K8s resources (or have the controller push the selector spec via a different surface).(1) is probably the cleanest if we can do it without making the controller depend on seid availability for membership — but that's a structural question worth weighing.
Out of scope
Spec.Peersuser surface. The PeerSource union (ec2Tags,static,label) stays.Relevant experts
reconcilePeersshape, how a controller-owned resolver would handle the sidecar boundary.CollectAndSetPeerstask lifecycle, sei-infra peer membership.node_idresolution timing,:26657/statusreliability vs. on-chain alternatives.References
internal/controller/node/peers.go—reconcilePeers,LabelPeerSourceresolverinternal/planner/group.go:59—CollectAndSetPeerstask buildingruntimes/sidecar/tasks/collect_set_peers.go(or wherever seictl handles it) — the actual:26657/statusquery + config write