Status: active
What you'll learn: Provision a managed Docker daemon on a NodeInstance — the platform handles TLS provisioning via Vault-backed InternalCaService, binds the daemon to the SDWAN overlay /128, and exposes the host through the operator API.
Time: ~15 min
Builds on: Tutorial 01 (running node + catalog seeded) and Tutorial 02 (you've published a module — you understand the module → instance assignment pattern). For this tutorial we use the shipped
docker-enginemodule instead of a custom one.Sets you up for: Tutorial 04 — K3s cluster — the same handshake pattern you'll see here applies to K3s, just with different phase names.
sequenceDiagram
actor Op as Operator
participant Plat as Platform
participant Agent as powernode-agent<br/>(on NodeInstance)
participant CA as InternalCaService
participant Daemon as dockerd
Op->>Plat: assign docker-engine module
Note over Plat: SDWAN /128 already attached<br/>from Tutorial 01
Plat-->>Agent: heartbeat picks up<br/>new assignment
Agent->>Agent: apt install docker-ce<br/>+ generate Ed25519 keypair
Agent->>Plat: POST /runtime/handshake<br/>phase=wants_cert + csr_pem
Plat->>CA: sign CSR
CA-->>Plat: signed cert + chain
Plat-->>Agent: cert + chain
Plat->>Plat: create Devops::DockerHost<br/>bound to instance
Agent->>Agent: write daemon.json<br/>listen on SDWAN /128<br/>tls=true
Agent->>Daemon: systemctl start docker
Agent->>Plat: POST phase=ready<br/>version=25.0.3
Plat-->>Plat: host status:<br/>pending → connected
Op->>Plat: docker_list_containers<br/>(reaches dockerd over SDWAN /128)
Plat->>Daemon: (mTLS over overlay)
Daemon-->>Plat: container list
By the end you'll have a managed Docker daemon you can drive via MCP
(docker_* actions) or directly via the Docker CLI with the right TLS
material.
Phase 1 Docker is the System extension's first container runtime: each
managed daemon runs inside a NodeInstance, listens on only the SDWAN
overlay /128 (no public socket, no Unix socket exposed beyond the
instance), and authenticates clients via mTLS. The CA hierarchy is rooted
in InternalCaService (Vault PKI when available; fixture PEM in dev).
The daemon is provisioned via the runtime/handshake API — a stateless
phase machine the agent walks through (wants_cert → wants_config →
ready) on heartbeat ticks. The platform creates the corresponding
Devops::DockerHost row when the agent posts wants_cert. Host status
moves pending → connected only when the agent reports ready
(promotion is system_mark_docker_ready under the hood). Honest caveat:
the platform trusts that ready report — it does not run an
independent platform-side connectivity probe before flipping the host to
connected. If the agent reports ready but the daemon isn't actually
reachable, the first docker_* call surfaces the failure, not the
handshake.
The client mTLS keypair is signed by InternalCaService, then persisted
to Vault and mirrored onto the DockerHost row inside the same DB
transaction (so the hot path never needs a Vault round-trip). The
Vault-store-after-DB-commit ordering means a reader hitting the row in the
narrow window before the Vault write lands still gets the mirrored
material — the DB copy is the source of truth on the read path.
Crucially, the trust boundary is SDWAN network membership, not just TLS daemon credentials. A Docker host on SDWAN network A cannot be reached from a peer on network B even if it has valid TLS material, because the network paths don't intersect.
| Requirement | How |
|---|---|
| Working NodeInstance from Tutorial 01 (or a fresh one) | platform.system_provision_instance |
| SDWAN peer attached to that instance | platform.system_sdwan_attach_peer — the daemon needs a /128 to bind to. Required; provisioning errors MissingSdwanPeerError if missing. |
docker-engine module promoted to live (or blessed with override) |
Default catalog from Tutorial 01 includes it |
Operator permission system.docker_provision |
Default for admin users |
platform.system_sdwan_list_peers({ network_id: "<your-network>" })
// → { peers: [{ id, node_instance_id: "<your-instance>", overlay_address: "fd00:abcd:1::42", ... }] }Expected outcome: your instance has a /128 allocated. If not,
attach one:
platform.system_sdwan_attach_peer({
network_id: "<network-id>",
node_instance_id: "<instance-id>"
})// Find the template the instance was provisioned from
platform.system_get_instance({ id: "<instance-id>" })
// → { instance: { node_template_id: "<template-id>", ... } }
// Resolve the docker-engine module's id (assign takes module_id, not a name)
platform.system_list_modules()
// → { modules: [{ id: "<docker-module-id>", name: "docker-engine", ... }, ...] }
// Assign the module
platform.system_assign_module_to_template({
template_id: "<template-id>",
module_id: "<docker-module-id>"
})Expected outcome: assignment row created. On the next agent heartbeat
(within ~60s), the agent picks up the new module and starts installing
docker-ce.
platform.recent_events({ kind_prefix: "system.docker", limit: 20 })
// → events: [
// { kind: "system.docker.module.assigned", ... },
// { kind: "system.docker.runtime.installing", ... },
// { kind: "system.docker.handshake.wants_cert", ... },
// { kind: "system.docker.handshake.cert_signed", ... },
// { kind: "system.docker.handshake.ready", ... },
// { kind: "system.docker.provisioned", ... }
// ]Expected outcome: ~2–3 min wall clock for the full sequence on a warm
instance. The new Devops::DockerHost appears:
platform.docker_list_hosts()
// → { hosts: [{
// id: "host-<uuid>",
// node_instance_id: "<instance-id>",
// api_endpoint: "tcp://[fd00:abcd:1::42]:2376",
// status: "connected",
// version: "25.0.3"
// }] }platform.docker_pull_image({
host_id: "host-<uuid>",
image: "nginx:1.27-alpine"
})
// → { image: { id, repo_tags: ["nginx:1.27-alpine"], size: 22000000 } }
platform.docker_create_container({
host_id: "host-<uuid>",
image: "nginx:1.27-alpine",
name: "hello-nginx",
ports: [{ host: 8080, container: 80 }],
detach: true
})
// → { container: { id, status: "running", ... } }Expected outcome: container is reachable on the instance's overlay
/128:8080 from any peer on the same SDWAN network.
Three independent ways to confirm:
Via MCP:
platform.docker_list_containers({ host_id: "host-<uuid>" })
// → { containers: [{ id, image: "nginx:1.27-alpine", status: "Up 30 seconds", ... }] }Via container HTTP (from an operator workstation peer on the same network):
curl http://[fd00:abcd:1::42]:8080
# → <!DOCTYPE html>... (nginx default welcome page)Via Docker CLI (if you've exported the operator client cert from the platform):
docker --tlsverify \
--tlscacert ~/.powernode/operator-ca.pem \
--tlscert ~/.powernode/operator-cert.pem \
--tlskey ~/.powernode/operator-key.pem \
-H tcp://[fd00:abcd:1::42]:2376 \
ps(operator cert export procedure is in docs/runbooks/node-provisioning.md)
platform.docker_delete_container({ host_id: "host-<uuid>", container_id: "<id>", force: true })
platform.docker_delete_image({ host_id: "host-<uuid>", image_id: "nginx:1.27-alpine" })
// Decommission the managed daemon (drops cert, stops dockerd)
platform.system_decommission_docker_runtime({ host_id: "host-<uuid>" })
// Optionally unassign the module from the template
platform.system_unassign_module_from_template({
template_id: "<template-id>",
module_id: "<docker-module-id>"
})MissingSdwanPeerError on assignment — the instance has no SDWAN peer
attached. The daemon needs a /128 to bind to (it doesn't listen on 0.0.0.0).
Run Step 1's attach command first.
Handshake stuck at wants_cert — InternalCaService is failing. Check
which adapter is active:
platform.platform_provisioning_status()
// → { ca_adapter: "LocalCaAdapter" | "VaultCaAdapter", ... }If VaultCaAdapter, verify Vault's pki_int mount is reachable and the
node role exists. Per project_vault_pki_state memory, dev environments
typically run on LocalCaAdapter (fixture PEM) — that's expected.
Handshake fails at wants_config — the agent received the cert but
can't write daemon.json. SSH to the instance and check
journalctl -u powernode-agent.service for permissions errors. Common cause:
/etc/docker/ is missing or owned by a non-root user.
Docker daemon starts but status stuck at pending — daemon isn't
returning the ready phase. Three sub-cases:
- Daemon listening on wrong address (check
ss -tlnp | grep dockerdon the instance — should be[fd00:...]:2376, not0.0.0.0:2376) - mTLS misconfigured (daemon refuses connections without client cert; check
journalctl -u docker.service) daemon.jsonsyntax error (rundockerd --validateon the instance)
Cannot reach the daemon from operator workstation — your workstation peer isn't on the same SDWAN network as the docker host. Use:
platform.system_sdwan_create_access_grant({
network_id: "<host's network>",
device_name_hint: "ops-laptop"
})…then import the resulting WireGuard config on your laptop.
- Tutorial 04 — K3s cluster — same handshake
pattern, different runtime: K3s control plane via
k3s-servermodule + workers viak3s-agent. CONTAINER_RUNTIMES.md— Phase 1 + Phase 2 full operator guide with troubleshooting trees.USE_CASE_MATRIX.md— what works / doesn't for 10 NodeInstance container scenarios, including limitations of Phase 1 Docker (single-host only — no cross-host Swarm).SMOKE_TEST.mdPass 2 —smoke_test_docker_runtime.rbexercises the same handshake at the platform layer without a live VM.
Last verified: 2026-06-03