Status: active
How the platform turns an operator's intent into an executable, approval-gated
infrastructure plan — and runs it. This is the architecture reference behind the
operator-facing CONCIERGE_PROVISIONING_GUIDE.md.
The central design choice is two composition paths that converge on one runtime. A deterministic path handles recognized provisioning scenarios; an LLM-general path handles novel intents. Both emit the same plan shape, so a single runner, a single cross-step data-flow substrate, a single rollback mechanism, and a single approval gate apply to either.
All file references are to the parent platform tree (server/…) unless
noted; these composition services are core, and the system extension supplies
the executors, the mission template, and the Concierge surface.
Both composers produce an Ai::GoalPlan whose steps are
Ai::GoalPlanStep records of step_type: "provisioning_skill". Each step's
execution_config is the contract the runner consumes:
{
"skill" => "<executor_name>", # e.g. "provision_full_stack"
"inputs" => { ... }, # resolved executor inputs
"depends_on_outputs" => { ... }, # optional cross-step wiring (see §4)
"on_failure" => "rollback" | "continue"
}Plus, on the step record itself:
step_number— integer, the dependency token.dependencies— array of predecessorstep_numbers.
Because both paths emit exactly this, everything downstream — the
SkillCompositionRunner, the plan_review approval gate, rollback, and the
live MissionChannel streaming — is path-agnostic. This reuse-first stance is
explicit in the MissionComposer class documentation: rather than introduce a
parallel plan model, it emits the same shape SkillCompositionRunner already
executes.
Ai::Provisioning::PlanComposerService
(server/app/services/ai/provisioning/plan_composer_service.rb)
is the path for known provisioning scenarios. It is what runs today for
every infrastructure mission.
- Reads the Project Brief from
mission.configuration["brief"](raisesBriefMissingErrorif absent — capture must run first). - Guards on the daily LLM cost cap (
CostCapGuard); returnsniland setscap_exceeded_payloadwhen exhausted. - Resolves the BYOC provider choice. With multiple configured providers and no
unambiguous
preferred_provider, it short-circuits and returns a{ clarification_needed: true, … }payload instead of composing. - Finds or creates a backing
Ai::AgentGoal, then defers DAG synthesis to theAi::Autonomy::GoalDecompositionServiceLLM kernel. - Rewrites every emitted step into the provisioning shape via
rewrite_step!— dropping advisory step types (human_review,observation,sub_goal) and keeping only executable ones. - Resolves human-readable brief values into concrete record UUIDs
(
template_id,provider_region_id,provider_instance_type_id) so the persisted plan carries actionable inputs. - Stamps the plan id onto
mission.configuration["plan"]["plan_id"].
The defining property of this path is that every step's skill must be in a
fixed allow-list — PlanComposerService::ALLOWED_EXECUTORS. The composer maps
an action description to a skill via, in order:
Ai::Tools::SemanticToolDiscoveryService(semantic match, filtered to the allow-list), then- a static regex
STATIC_ACTION_MAP(first match wins), then DEFAULT_EXECUTOR(provision_full_stack).
validate_plan re-checks that every step's skill is in ALLOWED_EXECUTORS,
that the graph is acyclic, and that every dependency references a real step.
This is what makes the path deterministic and bounded: the brief can only
produce provisioning skills the platform recognizes.
The LLM kernel sometimes emits redundant trees for a trivial brief. The composer
folds them: collapse_consecutive_same_target_steps! merges linear-chain
duplicates, and collapse_redundant_provisioning_clusters! collapses
identical-fingerprint provision_full_stack steps regardless of DAG shape,
capping the count to brief.scale.initial. The result is that a one-instance
brief produces a one-instance plan even if the kernel hallucinated eight
redundant steps.
When the brief carries repo_url, the composer attaches a runtime role module
to the node template (ROLE_MODULE_FOR_USE_CASE / RUNTIME_HINT_TO_MODULE)
and appends a deploy_app_code step. That step's node_instance_id is unknown
at compose time, so it is wired via depends_on_outputs to the upstream
provision_full_stack step's outputs.node_instance_ids (see §4).
Ai::Missions::MissionComposer
(server/app/services/ai/missions/mission_composer.rb)
is the novel-intent half. Where Path A is constrained to the provisioning
allow-list, Path B can sequence any agent-bound skill — federation, SDWAN,
ingress, runtime, and so on. It is the general composer the router selects for
novel intents, converging on the same plan shape as Path A.
Path B does not have a static allow-list. Its constraint is the candidate
pool: only skills that are status: "active" and bound to at least one
agent (Ai::AgentSkill with is_active: true) and resolve to an executor
descriptor are composable, capped at MAX_CANDIDATES (20). Each candidate is
resolved to its executor's I/O contract (descriptor inputs / outputs) so the
LLM can wire data flow and so the runner-dispatched identifier (the executor
name, not the Ai::Skill slug) can be validated. The LLM cannot invent a
skill outside this pool.
- Cost cap —
CostCapGuardgates the LLM call; returnsnilon exhaust. - Decompose — the LLM is prompted with the candidate catalog (skill ids + input/output keys) and asked for an ordered DAG as strict JSON.
- Validate / normalize — steps referencing a non-candidate skill are
dropped (a near-miss should not fail the whole plan); steps are renumbered
contiguous from 1;
depends_on_outputs.from_stepis remapped; the graph is cycle-checked (a cyclic plan is rejected outright); at mostMAX_STEPS(15) steps survive. - Persist — creates a
draftAi::GoalPlanand the sameprovisioning_skillsteps with{ skill, inputs, depends_on_outputs, on_failure: "rollback" }, then links the plan id to the mission viamission.configuration["plan"]["plan_id"]— exactly the pointer Path A sets.
Because the persisted shape and the mission pointer are identical, the existing
execute path (AiProvisioningExecuteJob → SkillCompositionRunner) runs a
Path-B plan unchanged once the operator clears the plan_review gate.
A step rarely knows, at compose time, a value another step will produce at
runtime (e.g. the instance id a provision_full_stack step creates). Both
composers express this with execution_config["depends_on_outputs"], resolved
by Ai::Provisioning::SkillCompositionRunner
(server/app/services/ai/provisioning/skill_composition_runner.rb).
Shape:
"depends_on_outputs" => {
"<input_key>" => {
"from_step" => <predecessor step_number>,
"path" => "<dot.path into that step's recorded outputs>", # default: input_key
"select" => "first" | "last" | "all" | <Integer index> # default: "all"
}
}Each completed step's outputs are persisted to its metadata["last_outputs"]
(record_outputs). Before invoking a step, merge_depends_on_outputs:
- builds
{ predecessor_step_number => recorded_outputs }for the step's declared dependencies (upstream_outputs_for, reading each predecessor'smetadata["last_outputs"]), - for each mapping entry, digs the dot-path into the source outputs
(
dig_path, tolerant of string/symbol keys across the JSON round-trip), - applies the array selector (
select_output—first/last/index/all), - overwrites the compose-time placeholder for that input key. A missing/blank
upstream value is skipped, never clobbering an existing input with
nil.
The canonical example: deploy_app_code pulls node_instance_id from the
upstream provision_full_stack step's outputs.node_instance_ids with
select: "first". This is the same substrate the MissionComposer prompt
instructs the LLM to use, so Path A and Path B wire data flow identically.
The platform's intended composition strategy is template-match first, LLM fallback:
- Template / recognized-scenario match — when an intent maps to a known
provisioning scenario, the deterministic Path A
(
Ai::MissionTemplate+PlanComposerService) composes it. This is bounded byALLOWED_EXECUTORSand benefits from the over-decomposition guards and brief→UUID resolution. - LLM fallback for novel intents — when the intent does not fit a recognized
provisioning scenario, the general Path B (
MissionComposer) sequences any agent-bound skill into the same plan shape.
The decision is centralized in Ai::Missions::ComposerRouter
(server/app/services/ai/missions/composer_router.rb),
which exposes a side-effect-free deterministic_provisioning?(brief) predicate
and a select(brief:) that returns the chosen — but not-yet-invoked — composer.
The predicate is grounded entirely in PlanComposerService's existing signal
maps (ROLE_MODULE_FOR_USE_CASE, RUNTIME_HINT_TO_MODULE) plus provisioning-shaped
brief fields (a non-empty regions list, a preferred_provider, a mappable
runtime_hint, or scale.initial > 0) — there is no hardcoded use_case enum. It
runs before composing (never try-then-discard), because PlanComposerService
persists on success and a discarded probe would leak a real plan.
All three compose entry points route through ComposerRouter, so the decision is
identical everywhere, and all three read/write the same
mission.configuration["plan"]["plan_id"]:
- Server-driven (worker phase job) —
AiProvisioningComposePlanJobPOSTs toPOST /api/v1/internal/ai/provisioning/missions/:mission_id/compose_plan(Api::V1::Internal::Ai::ProvisioningController#compose_plan), which selects its composer viaComposerRouter. This is the path the orchestrator drives automatically when the mission enters thecompose_planphase. - Concierge chat —
Ai::Tools::ProvisioningTool#compose_plan(the MCP action the Concierge dispatches throughConciergeToolBridge) selects its composer viaComposerRouter. - Interactive deep-link (REST) —
POST /api/v1/ai/missions/:id/compose_plan(Api::V1::Ai::MissionsController#compose_plan) branches onmission_type == "infrastructure", reuses the cached plan when one already exists (no extra LLM cost), and otherwise composes a new one viaComposerRouter.
Because every path emits the identical provisioning_skill plan shape and the
same mission pointer, the execute/approve/runner spine carries a Path-A or Path-B
plan unchanged. The routing decision selects a composer, not a runtime.
After either composer persists a plan and stamps the mission's plan_id, the
lifecycle is identical.
Ai::Provisioning::SkillCompositionRunner orchestrates the plan as a DAG of
skill invocations, server-side, in parallel-safe layers:
execute!computes Kahn-style topological layers fromstep.dependencies, dispatches the first ready layer as per-step worker jobs (WorkerJobService.enqueue_job("AiProvisioningStepJob", …)), and records run-start side effects. It is idempotent — if any step is already pastpending, it returns the existing run state instead of re-dispatching (this closed a double-provision race).execute_step!(step)is the per-step entrypoint called back via the internal API. It resolves the executor by convention (provision_full_stack→System::Ai::Skills::ProvisionFullStackExecutor, falling back toAi::Skills::…), mergesdepends_on_outputs, invokes the executor, records outputs, marks progress, and dispatches newly-unblocked successors. Also idempotent per step.- On step failure with
on_failure: "rollback", it walks completed predecessors in reverse and runs each one'sdescriptor[:rollback]hook.
Every transition emits two side effects: a system message into the mission
conversation, and a provisioning_step_changed broadcast through
Ai::Missions::OrchestratorService#broadcast_step_event! — the single canonical
emission path for step events.
The runner spine is sound, but three sharp edges are worth knowing when debugging a stuck or surprising run:
- Undefined-executor fallback swallows the cause. In
execute_step!(skill_composition_runner.rb~L128–143), an unresolved executor raises"skill not found: <name>", which is caught by the surroundingrescue StandardErrorand routed throughhandle_failurelike any other step error. The step is marked failed and the message is logged, but there is no structured error code distinguishing "the skill name was wrong / not an executor" from "the executor ran and failed" — operators see a generic failure string. When triaging, read the logged[SkillCompositionRunner] step … raised:line for the real class. - Idempotency is step-status–scoped, not run-scoped.
execute!generates arunner_idand short-circuits (already_running: true) if any step is pastpending, which closes the common double-dispatch race. But because the guard keys off step status rather than a persisted owningrunner_id, two simultaneousexecute!calls that both observe an all-pendingplan can still race the first status write. The single-trigger approval path (§6, "Why approval is the single execution trigger") is what keeps this from happening in practice — there is deliberately one way to reach a run. - Cost-cap zero-handling is a footgun.
CostCapGuard.resolve_captreats aplan_cap.zero?as "unset" and falls back toDEFAULT_DAILY_CAP_USD— so a plan configured with a deliberate $0 cap does not block composition; it silently inherits the default. Treat $0 as "no explicit cap," not "disable LLM spend."
Execution is never automatic — it is gated. The system_provisioning template
defines two approval gates: review_plan (gate name plan_review) and
handoff. The mission pauses at each (Ai::Mission#awaiting_approval?), and an
inline Approve/Reject card is posted to chat.
Approving routes through
Ai::Missions::OrchestratorService#handle_approval!
(server/app/services/ai/missions/orchestrator_service.rb),
which:
- records an
Ai::MissionApproval(user, gate, decision), - honors the second-signature gate at
handoff(Business+ plans require two distinct approvers), - on approve, calls
advance!, transitioning the mission to the next phase and dispatching that phase's worker job — approvingreview_planadvances toexecute, which dispatchesAiProvisioningExecuteJob→ the runner, - on reject, rolls the mission back per the template's
rejection_mappings(review_plan→compose_plan,handoff→verify).
The phase-name → gate-name mapping is centralized in
Ai::MissionApproval.gate_for_phase (review_plan → plan_review), so every
layer agrees on which gate is active.
There is deliberately no platform_provisioning_execute tool action — the
internal #execute endpoint reached by AiProvisioningExecuteJob is the only
path to a run. Approval at review_plan is what advances the mission into
execute; the orchestrator drives it from there. A separate execute tool
variant raced this path and double-provisioned, which is why it was removed.
Operator NL ──▶ ConciergeToolBridge.classify_and_dispatch_provisioning
│ (intent == provision_infrastructure, confidence ≥ 0.5)
▼
ProvisioningTool: capture_brief
│ IntentCaptureService → mission.configuration["brief"]
▼
┌── HYBRID ROUTING ──────────────────────────────────────┐
│ recognized provisioning scenario → PlanComposerService │ (Path A: ALLOWED_EXECUTORS)
│ novel intent → MissionComposer │ (Path B: any agent-bound skill)
└────────────────────────────────────────────────────────┘
│ both: Ai::GoalPlan of provisioning_skill steps
│ + mission.configuration["plan"]["plan_id"]
▼
review_plan gate ──▶ inline Approve/Reject card
│ OrchestratorService#handle_approval! (records MissionApproval, advance!)
▼
AiProvisioningExecuteJob → SkillCompositionRunner.execute!
│ topological layers → AiProvisioningStepJob per step
│ execute_step! → executor → record outputs → dispatch successors
│ depends_on_outputs resolved from predecessor metadata.last_outputs
│ broadcasts via OrchestratorService#broadcast_step_event!
▼
verify → handoff gate → RalphLoop → adapting (sensor-driven)
Adaptation is not yet wired end to end. The
adaptingphase exists and the per-missionRalphLoop+ProjectSloSensorreconciler run, but the compose→adapt link that would turn an observed SLO breach into a new adaptation plan is still a stub: theplatform_provisioning_adaptMCP action returns{ todo: "M2", adaptation_plan: nil }(server/app/services/ai/tools/provisioning_tool.rb#adapt). Adaptation-proposal generation lands with the M2 sensor reconciler — until then, treatadaptingas a monitoring phase, not a self-replanning one.
| Concern | File |
|---|---|
| Deterministic composer (Path A) | server/app/services/ai/provisioning/plan_composer_service.rb |
| LLM-general composer (Path B) | server/app/services/ai/missions/mission_composer.rb |
| DAG runner + cross-step data flow | server/app/services/ai/provisioning/skill_composition_runner.rb |
| Worker phase callbacks (internal API) | server/app/controllers/api/v1/internal/ai/provisioning_controller.rb |
| Interactive compose entry | server/app/controllers/api/v1/ai/missions_controller.rb (#compose_plan) |
| Concierge MCP surface | server/app/services/ai/tools/provisioning_tool.rb |
| Approval engine + phase advance + step broadcast | server/app/services/ai/missions/orchestrator_service.rb |
| Phase → gate mapping | server/app/models/ai/mission_approval.rb |
| Mission model + inline gate card metadata | server/app/models/ai/mission.rb |
| Mission template (phases, gates, rejection mappings) | extensions/system/server/db/seeds/system_provisioning_mission_template.rb |
| Compose worker job | worker/app/jobs/ai_provisioning_compose_plan_job.rb |
| Execute worker job | worker/app/jobs/ai_provisioning_execute_job.rb |
| Per-step worker job | worker/app/jobs/ai_provisioning_step_job.rb |
Last verified: 2026-06-03