Status: active
Get two Powernode platforms federated in ~5 minutes. This runbook covers the happy path for proposing → accepting → activating a federation peer. For failure modes, see federation-troubleshooting.md.
For the underlying protocol (sovereign auth model, social contract, three spawn modes), see the federation reference docs:
../federation/SPAWN_MODES.md—managed_child/autonomous_peer/cluster_member../federation/SOCIAL_CONTRACT.md— the 12-commitment framework../federation/NETWORK_TRUST.md— cryptographic trust model
For two operators — call them A and B — running independent Powernode platforms:
- A proposes federation with B's platform → gets back an acceptance token
- A shares the token + their platform URL with B (out of band: Signal, password manager, etc.)
- B accepts using the token → peer record on B's side flips to
accepted - The first successful heartbeat between them advances the state machine to
active - Either side can now offer/subscribe to services through the federation surface
At the end, both A's and B's platforms have a System::FederationPeer row for each other, each in peer_kind: "platform" + status: "active".
- Both platforms reachable from each other (no NAT issues; each platform's
remote_instance_urlresolves from the other side) - An operator account on each side with the
system.peers.invitepermission (to propose/accept) andsystem.peers.manage(to revoke) — both defined by the system extension - (Recommended) An out-of-band secure channel — Signal, 1Password share, in-person — for token handoff
If your platform is behind NAT or you're peering with a sovereign on-prem satellite, front the federation traffic through a publicly-reachable hub peer — see the hub-and-spoke topology in ../FEDERATION_MULTI_SITE_GUIDE.md §3 and the publicly_reachable hub guidance in sdwan-network-setup.md Phase 2.
peer_kind: "platform" peers live on the Platform Peers surface, not
the SDWAN federation-peer surface. From the operator UI: Compute →
Platform → Peers → Invite Peer. Or via the REST endpoint
(POST /api/v1/system/platform/peers, served by Platform::PeersController;
requires the system.peers.invite permission):
# On A
curl -s -X POST \
-H "Authorization: Bearer $JWT_A" \
-H "Content-Type: application/json" \
http://localhost:3000/api/v1/system/platform/peers \
-d '{
"remote_instance_url": "https://platform-b.example.com",
"spawn_role": "symmetric",
"spawn_mode": "out_of_band",
"token_ttl_seconds": 604800
}'
# → { "data": { "peer": { "id": "...", "peer_kind": "platform", "status": "proposed", ... },
# "acceptance_token": "fbazXyZ123abc456..." } }This creates a System::FederationPeer row on A's side with
peer_kind: "platform" + status: "proposed", and returns the single-use
acceptance token in the same response (spawn_mode defaults to
out_of_band for hand-paired peers; see
../federation/SPAWN_MODES.md). It does
not contact B yet.
Why not
system_sdwan_propose_federation_peer? That MCP action proposes a SDWAN-scoped cross-account peer (peer_kind: "sdwan_only") and does not acceptpeer_kind/spawn_role— it cannot create aplatformpeer. Use it only for pure overlay bridging (seesdwan-network-setup.mdPhase 9).
The plaintext token from Step 1's response is shown exactly once — copy
it now. Only its SHA-256 digest is persisted (acceptance_token_digest
column). B will present it when accepting.
The TTL above is 7 days (token_ttl_seconds: 604800). Pass a shorter
value if you're handing it off immediately (3600 = 1 hour) or want a
tighter window.
Share with B, out of band:
- A's platform URL (the
remote_instance_urlthey'll register:https://platform-a.example.com) - The plaintext token from step 2
- (Optional) The contract version A is operating under — defaults to the current platform-wide default
Don't drop the token into a shared Slack channel; it grants peer enrollment on A.
On B, first register A as a platform peer (same Platform Peers endpoint as Step 1, now pointing back at A), then accept it with the token A shared. From the operator UI: Compute → Platform → Peers → Invite Peer, then Accept. Or via REST + MCP:
# On B — register A, capture B's local peer id for the accept call
curl -s -X POST \
-H "Authorization: Bearer $JWT_B" \
-H "Content-Type: application/json" \
http://platform-b.example.com/api/v1/system/platform/peers \
-d '{ "remote_instance_url": "https://platform-a.example.com", "spawn_role": "symmetric" }'
# → { "data": { "peer": { "id": "<B-side-peer-id>", "status": "proposed", ... } } }# Then accept by B-side peer id, presenting A's token (MCP):
platform.system_sdwan_accept_federation_peer
federation_peer_id: "<B-side-peer-id>"
acceptance_token: "<token from A>"
The accept flow:
- B already has its own
System::FederationPeerrow pointing at A (created above) - B calls A's
POST /api/v1/system/federation_api/acceptwith the token - A's
AcceptControllerverifies the token against the stored digest (SHA-256 secure_compare) - If valid, A's peer row transitions
proposed → acceptedand the token digest is cleared (single-use) - B's peer row transitions
proposed → acceptedon success response
Verify the accept landed on both sides (platform peers live on
/platform/peers, which scopes to peer_kind: "platform"):
# On A
curl -s -H "Authorization: Bearer $JWT_A" http://localhost:3000/api/v1/system/platform/peers \
| jq '.data[] | select(.remote_instance_url=="https://platform-b.example.com") | {id, status}'
# => { "id": "...", "status": "accepted" }
# On B
curl -s -H "Authorization: Bearer $JWT_B" http://platform-b.example.com/api/v1/system/platform/peers \
| jq '.data[] | select(.remote_instance_url=="https://platform-a.example.com") | {id, status}'
# => { "id": "...", "status": "accepted" }Once both sides are accepted, the next steps are automatic:
- The
FederationHeartbeatJobticks every 60s on each side (declared inworker/config/sidekiq.ymlunder:federation_heartbeat). - On its first successful heartbeat to the remote peer, the local peer's
record_heartbeat!transitionsaccepted → enrolled → active. - The
last_handshake_atandlast_heartbeat_atcolumns get populated.
Wait ~60s, then verify:
curl -s -H "Authorization: Bearer $JWT_A" http://localhost:3000/api/v1/system/platform/peers \
| jq '.data[] | {id, status, last_heartbeat_at}'
# => { "id": "...", "status": "active", "last_heartbeat_at": "2026-05-17T13:45:12Z" }If status hasn't advanced past accepted after ~3 minutes, see federation-troubleshooting.md.
Now that the peer is active, you can issue a cross-peer service grant
(a System::FederationGrant) so B can call A's federation_api/resources
endpoints. This is not an system_sdwan_* MCP action — it is a REST
endpoint on A under the Platform Peers surface
(POST /api/v1/system/platform/peers/:peer_id/grants, served by
Platform::PeerGrantsController; the operator UI exposes it as the per-peer
Grants editor). Example: grant B read-only access to A's nginx module
catalog:
# On A — :peer_id is B's System::FederationPeer id on A's side
curl -s -X POST \
-H "Authorization: Bearer $JWT_A" \
-H "Content-Type: application/json" \
http://localhost:3000/api/v1/system/platform/peers/<B-peer-id>/grants \
-d '{
"remote_subject": "operator@platform-b.example.com",
"resource_kind": "NodeModule",
"resource_id": null,
"permission_scopes": ["read"],
"node_instance_ids": [],
"sdwan_network_ids": [],
"source_cidrs": []
}'resource_id: null means "all of resource_kind"; the three trailing arrays
are the optional pessimistic-scope allowlists (Locked Decision #12) —
empty leaves that axis unrestricted (FederationGrant#unrestricted?). The
grant returns a bearer token (fg-<grant_id>) that B presents alongside its
mTLS cert when calling A's federation_api. Default TTL is 30 days; the
grant validates well-formed array contents (UUIDs, CIDRs) on save (LD #12).
See ../federation/NETWORK_TRUST.md for the
pessimistic-grant matching algorithm.
Don't confuse this with
system_sdwan_create_access_grant— that MCP action issues aSdwan::AccessGrant, which is a VPN user-access entitlement (a user's right to attach WireGuard devices to one SDWAN network:network_id+user_id+tags). It has nothing to do with cross-peer federation grants. Seesdwan-network-setup.mdPhase 7.
Steps 4–5 above describe the operator-on-B-runs-the-MCP-action path. There is a second path: completing the accept as an approval-gated skill, so the System Concierge (operator chat) or the SDWAN Manager autonomy loop can finish a peering whose acceptance token the platform holds.
Both paths run the same orchestration —
System::Federation::FederationAcceptanceService — which owns the full
accept chain. Phase 3 extracted it so the HTTP endpoint, the skill, and any
future re-accept flow share one implementation. The chain:
verify contract_version (HARD — abort 422 if unsupported)
→ locate peer by token digest (HARD — abort 401 if not found / expired)
→ peer.accept! (HARD — token round-trip)
→ peer.enroll! (HARD, platform peers — capabilities + extensions + endpoints)
→ ensure managed_child operator grant (idempotent)
→ issue node_api bootstrap token (HARD, managed_child spawns)
→ SDWAN attach (SOFT — PeerEnroller + bridge activate!)
→ federation governance scan (SOFT — cert/drift/prefix findings)
HARD steps abort the whole accept on failure. SOFT steps (SDWAN attach, governance scan) are collected as warnings — the accept still succeeds with the peer enrolled, and you can re-run the soft step independently later.
Run it via the Concierge ("accept the federation peer using token <X>,
contract version 1") or directly as the skill:
# execute_agent takes agent_id (ID, slug, or exact name) + an input object;
# the agent runs its bound federation_acceptance skill on the input.
platform.execute_agent # or via Concierge chat
agent_id: "SDWAN Manager"
input:
skill: "federation_acceptance"
acceptance_token: "<token from the proposing side>"
contract_version: 1
# optional forward-compat fields:
capabilities: {}
extension_slugs: []
endpoints: [] # [{ url, scope, priority, cidr_hint? }]
Because federation peering is sensitive, the skill is
approval-gated (requires_approval: true) — it lands in the approval
queue and must be approved before the chain runs. The result returns
peer_id, status, contract_version_agreed, the node_enrollment block
(for managed-child spawns), the sdwan_attach result, the governance
result, and any warnings.
When to use which: use the MCP action (Step 4) for plain out-of-band peering you're driving by hand. Use the
federation_acceptanceskill when you want the accept to flow through the Concierge or the autonomy loop with the approval gate — e.g. completing a spawned child's handshake, or re-accepting after a transient failure.
For the full architecture of the accept chain (hard/soft steps, the
managed-child grant, the SDWAN attach, governance), see
../FEDERATION_MULTI_SITE_GUIDE.md §2.
Steps 1–6 establish trust between two sites. To carry workload traffic
between them over the encrypted overlay, compose a federation topology —
hub-and-spoke or full-mesh — with the sdwan_federation_compose skill (bound
to System Topology Designer):
platform.execute_agent
agent_id: "System Topology Designer" # ID, slug, or exact name
input:
skill: "sdwan_federation_compose"
network_name: "fed-overlay-a-b"
topology: "hub_and_spoke" # or "full_mesh"
routing_protocol: "ibgp" # or "static"
peers:
- node_instance_id: "<site-a-hub-instance>"
role: "hub" # hub_and_spoke only; hubs MUST have an endpoint
endpoint_host_v6: "fd00:abcd:1::21"
endpoint_port: 51820
- node_instance_id: "<site-b-instance>"
role: "spoke"
dry_run: false # set true to preview the fan-out without persisting
Topology choice:
hub_and_spoke— peers behind NAT funnel through a publicly-reachable hub. At least one peer must berole: "hub"and every hub must carry an endpoint (endpoint_host_v6/v4+endpoint_port) — the skill fails fast otherwise.full_mesh— any-to-any direct connectivity; no hub/spoke distinction. Best for low-RTT peers needing direct reachability.
routing_protocol: "ibgp" enables FRR route-policy distribution between
peers; static emits no FRR policy. Use dry_run: true first to review the
projected peer/hub counts and step list before building.
See ../FEDERATION_MULTI_SITE_GUIDE.md §3
for the topology composition internals and the choose-a-topology decision tree.
To give one tenant a fully-segregated network slice (its own VRF, /64,
firewall, and OVN ACLs) inside the account — entirely SDWAN-native, no k8s
NetworkPolicy or VLAN — use the multi_tenant_isolation skill (bound to
System Topology Designer, approval-gated):
platform.execute_agent
agent_id: "System Topology Designer" # ID, slug, or exact name
input:
skill: "multi_tenant_isolation"
tenant_key: "acme-prod" # slug-safe; names the network, rules, switch, ACLs
# tenant_cidr omitted ⇒ the auto-allocated /64 is used (recommended)
# nb_db_endpoint / sb_db_endpoint required only if the account has no OvnDeployment yet:
nb_db_endpoint: "tcp:127.0.0.1:6641"
sb_db_endpoint: "tcp:127.0.0.1:6642"
dry_run: false
What it builds (composed from existing SDWAN services, IDs threaded inline):
- A dedicated
Sdwan::Network(routing_protocol: "ibgp") → its own VRF + isolated RIB (no shared forwarding table with other tenants). - A non-overlapping
/64viaSdwan::PrefixAllocator(the tenant's blast-radius boundary). - nftables firewall rules: allow the tenant's own
/64(high priority) + default-deny wildcard (low priority). - An OVN logical switch scoped to the tenant CIDR.
- OVN ACLs: allow intra-tenant, drop cross-tenant.
Rollback (on failure or teardown) is reverse-order: OVN ACLs → OVN switch →
firewall rules → network. Use dry_run: true to see the planned actions
first. Full architecture in
../FEDERATION_MULTI_SITE_GUIDE.md §4a.
A service on one site is reachable from a federated peer over a stable overlay VIP — no public exposure needed when the consumer is another federated site:
Sdwan::VirtualIp— a stable overlay address fronting the backend (static single-holder, or anycast multi-holder).- BGP advertisement — the VIP emits a
Sdwan::SubnetAdvertisement(sourcevirtual_ip); FRR advertises the prefix into the iBGP fabric, so every peer (and federated peer, subject to route policy) learns the route. - Traefik route (only for public consumers) — a hub DNAT port mapping +
reverse-proxy regen front the VIP on
443/80. - External DNS (only for public names) —
Acme::DnsClientpublishes the public A/AAAA/CNAME. This is the only non-SDWAN seam in the discovery path.
For a single public service, the expose_service_publicly skill chains
VIP → port mapping → ACME cert → reverse-proxy regen — see
expose-service.md. For federated peer-to-peer
discovery (no public exposure), the VIP prefix is learned across the
federation bridge when route policy permits it. Architecture in
../FEDERATION_MULTI_SITE_GUIDE.md §4b.
The default in this runbook is spawn_role: "symmetric" (both sides are equal peers). For asymmetric federations:
managed_child— A spawns B as a managed-child satellite (e.g., on-prem edge platform). B's autonomy is bounded by grants A issues.autonomous_peer— Like symmetric but B is a fully sovereign instance that may federate further with C, D, etc.cluster_member— B is joining an existing federation cluster (typically a K3s control plane).
See SPAWN_MODES.md for the operator runbook covering each variant — they all use the same accept-token flow above, but the spawn-mode determines downstream behavior.
- Understand the full architecture:
../FEDERATION_MULTI_SITE_GUIDE.mdcovers the acceptance orchestration, SDWAN topology, tenant isolation, service discovery, the liveness autonomy loop, and security in depth - Subscribe to a peer service: see the
Service Catalogdeveloper guide - Migrate a resource across peers: see the Migration framework documentation
- Monitor peer health: the Fleet Dashboard's federation tab surfaces every peer, current status, and heartbeat freshness. The liveness autonomy loop (
FederationPeerLivenessSensor→federation_peer_remediate) automatically probes stale peers over mTLS, degrades unreachable ones, and alerts on cert expiry — see../FEDERATION_MULTI_SITE_GUIDE.md§5 - Pause federation operations (during maintenance): the SDWAN Manager agent's federation actions are gated by
require_approval— drain the approval queue or pause the agent perSDWAN_MANAGER_AGENT.md
Last verified: 2026-06-03