Status: active
Operator guide for running a provisioning mission by talking to the System Concierge chat agent. This walks the full lifecycle — describe what you want, review the composed plan, approve it inline in chat, and watch the resources provision step by step.
This is the operator-facing companion to
MISSION_COMPOSITION_ARCHITECTURE.md,
which documents how plans are composed and executed under the hood.
The provisioning conversation turns a plain-English request ("stand up a small web app in us-east-1 with a $200/month budget") into an executable, approval-gated infrastructure mission. You never have to assemble the DAG by hand — the Concierge captures a structured brief, the platform composes a plan, and you approve it before any resource is created.
Every provisioning run is backed by an Ai::Mission of
mission_type: "infrastructure", bound to the system-wide
system_provisioning mission template. The template is seeded by
server/db/seeds/system_provisioning_mission_template.rb
— if you have not run that seed, the Concierge will tell you the template is
missing.
Open the System Concierge chat and describe what you want in natural language. You do not need a special command or syntax — describe the workload, the region, the scale, and any budget.
Example openers:
- "Provision a 2-instance API server in us-east-1, budget around $300/month."
- "Stand up a containerized app stack and deploy https://git.example.com/team/app.git on it."
- "I need a small database node with 99.9% availability."
When you send a message, the Concierge runs it through an intent classifier before the general chat tool loop:
- Intent classification —
Ai::ConciergeToolBridge#classify_and_dispatch_provisioningcallsAi::Provisioning::IntentCaptureService#classify. This combines a regex pre-filter with an LLM confidence-scoring pass. - Threshold gate — the message is treated as a provisioning request only
when the classified intent is
provision_infrastructureand confidence is at least0.5(PROVISIONING_CONFIDENCE_THRESHOLD). Below that, the message falls through to the normal chat flow, so a casual mention of "infrastructure" will not accidentally start a mission. - Brief capture — on a hit, the bridge dispatches the
platform_provisioning_capture_briefaction ofAi::Tools::ProvisioningTool. With nomission_id, this creates a fresh infrastructure mission bound to thesystem_provisioningtemplate and captures the first brief.
You can also drive the same flow directly through MCP tool actions if you are scripting against the platform — see §7.
Why the Concierge stays on-topic. The System Concierge runs with a
concierge_tool_filterthat narrows its tool surface to the provisioning- and fleet-relevant actions (system_*,docker_*,kubernetes_*, plusdiscover_skills/get_skill_context/request_confirmation) — roughly two dozen actions rather than the full platform catalog. This keeps plan composition focused and prevents the agent from reaching for unrelated tools mid-provision.
Every provisioning mission moves through the seven phases defined on the
system_provisioning template
(system_provisioning_mission_template.rb,
PROVISIONING_PHASES). Two of them are approval gates — the mission pauses
there until you act.
| Order | Phase key | Label | Approval gate | Worker job (server-driven path) |
|---|---|---|---|---|
| 0 | capture_intent |
Capture Brief | no | AiProvisioningCaptureIntentJob |
| 1 | compose_plan |
Compose Plan | no | AiProvisioningComposePlanJob |
| 2 | review_plan |
Review & Approve | yes (plan_review) |
— (waits for you) |
| 3 | execute |
Provision Resources | no | AiProvisioningExecuteJob |
| 4 | verify |
Verify SLO Targets | no | AiProvisioningVerifyJob |
| 5 | handoff |
Hand Off | yes (handoff) |
AiProvisioningHandoffJob |
| 6 | adapting |
Continuous Adaptation | no | — (sensor-driven, long-lived) |
The canonical pipeline is therefore:
capture_intent → compose_plan → review_plan(gate) → execute → verify → handoff(gate) → adapting
Ai::Provisioning::IntentCaptureService translates your utterance into a
structured Project Brief stored at mission.configuration["brief"]. The
brief has required fields (intent, use_case, scale, regions,
budget_cap_usd_monthly — see IntentCaptureService::REQUIRED_FIELDS). The
tool returns the brief plus a missing_fields list.
- If fields are missing, the Concierge asks you for them. Reply in chat;
each reply is merged onto the existing brief as a clarification (the tool
calls
IntentCaptureService#refine). The mission stays atcapture_intent. - When the brief is complete (
missing_fieldsempty), the tool advances the mission tocompose_plan.
The platform composes an executable plan from the brief. For infrastructure
missions this runs Ai::Provisioning::PlanComposerService#compose!, which
decomposes the brief into a DAG of provisioning-skill steps (an Ai::GoalPlan).
The plan id is stamped onto mission.configuration["plan"]["plan_id"] so every
downstream consumer resolves the same plan. The mission then advances to
review_plan, where it stops and waits for you.
The two composition paths (deterministic provisioning vs. LLM-general) and how they converge on the same plan shape are covered in
MISSION_COMPOSITION_ARCHITECTURE.md.
A few outcomes you may see at this phase:
- Provider clarification — if your account has multiple cloud providers configured and the brief does not name one, composition pauses and the chat surfaces a "which provider?" question with the available options. Answer it and the next compose round proceeds.
- Cost cap reached — if the account has exhausted its daily LLM cost cap, composition returns no plan and the chat surfaces an upgrade prompt rather than retrying.
The composed plan is presented in chat as a rich card with the step list. The
mission is now awaiting_approval? at the plan_review gate. Nothing is
provisioned yet. See §4 for how to act.
Once you approve, the mission advances to execute and
AiProvisioningExecuteJob kicks off Ai::Provisioning::SkillCompositionRunner.
The runner computes parallel-safe layers from the step dependencies and
dispatches one worker job per step. Each step runs its skill executor; steps
stream progress to chat and to the live UI as they transition. See
§5.
After the steps complete, AiProvisioningVerifyJob runs a verification pass
against the mission's slo_targets and records the result on
mission.configuration["verification"]. On success the orchestrator advances to
handoff.
The mission pauses at handoff — the second approval gate. Approving here
creates a per-mission Ai::RalphLoop (the long-lived adaptation driver) and
advances the mission to adapting. A system message marks the handoff in chat.
On Business+ plans with
second_signature_requiredenabled, thehandoffgate requires two distinct approvers. The first approval is recorded and the mission stays athandoffuntil a different user also approves. Free/Pro tiers advance after a single approval.
The terminal, long-lived phase. There is no worker job — the mission stays in
adapting while the ProjectSloSensor reconciler samples health and the
mission's RalphLoop holds the adaptation context. The mission remains active here.
What "adapting" does and doesn't do today. The phase, the per-mission RalphLoop, and the SLO sensor are all live, so the mission keeps monitoring health. What is not yet wired is the step that turns an observed breach into a new adaptation plan: the
platform_provisioning_adaptaction is an M0 stub that returns{ todo: "M2", adaptation_plan: nil }. So treatadaptingas continuous monitoring for now — don't expect the mission to self-replan or re-provision in response to drift until the M2 sensor reconciler ships.
Rejecting a gate rolls the mission back per the template's
rejection_mappings:
- Rejecting
review_plansends the mission back tocompose_planso you can refine the brief and recompose. - Rejecting
handoffsends it back toverify.
Approval gates on infrastructure missions render as a clickable card directly in the Concierge chat — you do not need to leave the conversation or open a separate modal.
When the mission enters an approval-gate phase, Ai::Mission#post_milestone_to_conversation
posts a system message into the mission's conversation. For infrastructure
missions at a gate, that message carries action metadata:
concierge_action: true
action_type: "approve_mission_gate"
action_params: { mission_id: <id>, gate: <phase>, decision: "approved" }
actions: [ { type: "confirm", label: "Approve", style: "primary" },
{ type: "reject", label: "Reject", style: "danger" } ]
action_context: { type: "mission_approval", action_type: "approve_mission_gate", status: "pending" }
The chat UI renders this metadata as an Approve / Reject card.
Clicking Approve or Reject posts to the conversation's
confirm_action endpoint
(POST /api/v1/ai/conversations/:id/confirm_action with action_type and
action_params). That routes into
Ai::ConciergeService#handle_confirmed_action, which for
approve_mission_gate calls
Ai::Missions::OrchestratorService#handle_approval!. The orchestrator:
- records an
Ai::MissionApproval(with your user id, the gate, and the decision), - honors the second-signature gate at
handoffwhen configured, - on approve, advances the mission to the next phase (which dispatches the
next phase's worker job — e.g. approving
review_plankicks offAiProvisioningExecuteJob), - on reject, rolls the mission back per
rejection_mappings.
The Concierge then posts a confirmation message ("Approved Review Plan for … — now in Execute") and the original gate card is marked resolved.
Why approval is the only path to execution. There is intentionally no
platform_provisioning_executeaction. Approval atreview_planis what advances the mission intoexecute; the orchestrator drives the run from there. A separate execute action used to race this path and double-provision. Approve once and let the pipeline carry it forward.
You have two live surfaces plus the chat transcript.
The runner posts a system message into the mission conversation as the run starts and on every step transition, for example:
Provisioning run started — 4 step(s) across 2 layer(s).Step 1 (provision_full_stack) → executingStep 1 (provision_full_stack) → completed
These are emitted by SkillCompositionRunner via the mission conversation's
add_system_message, so they appear inline in the same Concierge thread.
For real-time step streaming, subscribe to the mission's MissionChannel
rather than polling. The runner (through
Ai::Missions::OrchestratorService#broadcast_step_event!) broadcasts:
| Event | When | Key payload |
|---|---|---|
provisioning_run_started |
run kicks off | runner_id, step_count, layer_count |
provisioning_step_changed |
each step transition | step_id, step_number, status, outputs, error |
phase_changed / status_changed |
mission phase / status change | current_phase, status, phase_progress |
mission_handed_off |
handoff approved | ralph_loop_id, phase: "adapting" |
Step status values are executing, completed, failed, and rolled_back.
The provisioning UI consumes these via the step-progress stream and the
provisioning page.
For a one-shot snapshot, use the platform_provisioning_status tool action
(or the Concierge's status reply). It returns the current mission phase, the
currently-executing step number, and step-number lists by status (completed,
pending, failed).
If a step fails and its on_failure is rollback, the runner walks the
completed predecessors in reverse and runs each one's rollback hook, emitting
rolled_back events as it goes. Failures surface in both chat and the live
stream with the error message attached.
If your brief includes a repo_url, the composer attaches a runtime role module
to the provisioned node template and appends a deploy_app_code step that
depends on the provision step. The deploy step receives the provisioned
instance id at runtime (resolved from the upstream provision step's outputs),
so it deploys onto the instance the run just created. Provide branch and
start_command in the conversation to control how your code is checked out and
launched.
The whole lifecycle is also available as MCP tool actions on
Ai::Tools::ProvisioningTool, for scripting or for the Concierge tool bridge:
| Action | Purpose |
|---|---|
platform_provisioning_capture_brief |
NL + optional mission_id → brief + missing_fields (creates the mission when mission_id is omitted) |
platform_provisioning_compose_plan |
mission_id → plan id + DAG, with cost / topology / risk enrichments |
platform_provisioning_approve_plan |
plan_id + decision (approved / rejected / modified) → advances or rolls back the mission |
platform_provisioning_status |
mission_id → phase + step lists by status |
platform_provisioning_adapt |
adaptation entry point — M0 stub today; returns { todo: "M2", adaptation_plan: nil }. Wires up with the M2 SLO-sensor reconciler |
For live progress, prefer subscribing to MissionChannel over polling
platform_provisioning_status.
| Symptom | Likely cause | What to do |
|---|---|---|
| Concierge answers normally instead of starting a mission | Intent classified below the 0.5 confidence threshold |
Rephrase more explicitly ("provision…", "stand up…", state region + scale) |
| "Mission template 'system_provisioning' not seeded" | Template seed not run | Run system_provisioning_mission_template.rb |
| Chat asks "which provider?" | Multiple providers configured, brief did not name one | Reply with the provider; composition resumes |
| Compose returns no plan, upgrade prompt shown | Daily LLM cost cap reached | Wait for the cap window to reset or raise the plan limit |
Mission stuck at review_plan |
Awaiting your approval | Use the inline Approve card |
Mission stuck at handoff after one approval |
Second-signature gate (Business+) | Have a second distinct user approve |
| Steps not progressing past the first layer | Worker not draining ai_execution |
Check powernode-worker@default status |
Last verified: 2026-06-03