Skip to content

Add reproducibility runbooks for five multi-reasoner templates#57

Merged
cafzal merged 22 commits intotelco-network-recovery-templatefrom
worktree-runbooks-and-base-ontologies
May 6, 2026
Merged

Add reproducibility runbooks for five multi-reasoner templates#57
cafzal merged 22 commits intotelco-network-recovery-templatefrom
worktree-runbooks-and-base-ontologies

Conversation

@cafzal
Copy link
Copy Markdown
Collaborator

@cafzal cafzal commented May 5, 2026

Why

These templates already work end-to-end as scripts, but a user starting from scratch with an RAI agent has to reverse-engineer the chain — which skill to load, which question to ask, what to expect back — to recreate the workflow on their own data. This PR adds a one-screen runbook per template that turns each into a ready-to-paste agent recipe: skill + prompt + expected response per step, in chain order, starting from raw demo data. Each template script also now materializes its post-solve aggregates as queryable ontology Concepts so the closing /rai-ontology-design step is a real deliverable, not aspirational.

Templates covered

Template Workflow
energy_grid_planning build ontology → examine → discover → forecast load → find structural bottlenecks → screen DC requests → approve DCs and fund upgrades → read frontier → persist
supply_chain_resilience build ontology → examine → discover → map upstream supplier exposure → rank network hubs → classify supplier reliability → solve risk-adjusted flow → quantify disruption scenarios → persist
machine_maintenance build ontology → examine → discover → diagnose plant operations → find scheduling bottlenecks → classify machine risk → schedule maintenance → stress-test concentration → persist
portfolio_balancing build ontology → examine → discover → compliance scan → cluster correlated bets → solve mean-variance frontier → read frontier → stress under crisis → persist
telco_network_recovery (lands with #56) build ontology → examine → discover → diagnose WEST → flag critical-restore towers → score subscriber blast radius → forecast regional demand → optimize tier selection → interpret plan → persist

Per-step format

Each step is a ### N. <topic> header followed by two bullets:

  • Prompt: an inline skill invocation the user can copy and paste — /rai-skill <natural-language question>
  • Response: what the agent produces (key numbers, properties written back to the ontology)

The chain bookends are: /rai-build-starter-ontology (Step 1) builds against the bundled CSVs in data/; /rai-querying (Step 2) examines the result; /rai-discovery (Step 3) scopes sub-questions to reasoner families; the chain stages do the work; /rai-ontology-design (final step) materializes the prescriptive aggregates the chain doesn't already write back as queryable Concepts.

Script edits

Each template's main script now persists its closing-step Concepts so the chain output lives in the ontology, not in pandas:

  • telcoRestorePlan singleton (total_cost, install_weeks, capacity, tier mix, towers_covered, binding_constraint) + SelectedUpgrade view restricted to the 15 chosen tower-tier rows
  • energyInvestmentPortfolio(InvestmentLevel) with per-budget aggregates (dc_count, total_mw, annual_revenue, upgrade_cost, net_value, marginal_per_m_to_next_level, is_knee_point)
  • supply chainRoutingScenario with per-scenario totals (status, total_cost, cost_delta_pct, active_flow_count, unmet_total, blocked_businesses); also adds a Business.is_avoid Relationship in place of the prior Python set intersection
  • machine maintenanceMaintenancePlan (cost breakdown + total_jobs), TypeConcentration(machine_type) (per-type qualification spread + travel %), CrossTrainingRecommendation (ranked candidates with is_best_candidate)
  • portfolioFrontierPoint(scenario_label, eps_label) with per-Pareto-point metadata (return, risk, marginal_risk_per_return, is_knee, vol_base, vol_crisis, vol_gap, vol_gap_pct)

All five re-tested end-to-end against the actual RAI product; numbers match the prior stdout output.

Test plan

  • All 5 scripts run end-to-end against bundled data/ CSVs (telco needs a predictive-enabled venv)
  • New ontology Concepts queryable via model.select(...) after the chain runs
  • Per-step prompts use the user's business question, not algorithm names or solver mechanics
  • Headline numbers in each runbook Response match the script's actual output
  • Reviewer paste-test: load the named skills in a fresh agent session, paste each Prompt against data/ in order, confirm output matches Response

Each runbook is an agent prompt sequence to recreate the template's
multi-reasoner pipeline using the bundled CSVs in ../data/, mapping
each stage to the template's actual concepts, properties, and outputs:

- telco_network_recovery (5-stage: descriptive -> rules -> graph ->
  predictive -> prescriptive; mirrors PR #56's existing structure)
- energy_grid_planning (4-stage: predictive -> graph -> rules ->
  prescriptive with InvestmentLevel scenarios)
- supply_chain_resilience (4-stage: blast-radius -> graph -> rules ->
  min-cost flow + scenarios)
- machine_maintenance (5-stage: querying -> graph -> rules ->
  prescriptive maintenance schedule -> resilience cross-training)

Reproducible against the bundled template CSVs; one-line notes on
swapping to a Snowflake schema for users wiring to their own data.
Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md
to all five runbooks. Each runbook now serves both the stakeholder
(narrative + ASCII visualizations) and the practitioner (explicit
skill + prompt to recreate the stage):

- "How to read this runbook" preface explaining the dual purpose
- Step 0 discovery section using rai-discovery to scope sub-questions
  to reasoner families before any chain stage runs
- Skill / Prompt boxed callout immediately under every Stage heading
  (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*)
- "Adapting this recipe to a new domain" closing section

Adds portfolio_balancing/references/runbook.md (4 stages: rules ->
graph clustering -> bi-objective Markowitz frontier -> crisis-regime
stress test, all aligned to the template's actual 8-stock dataset
and epsilon-rate frontier sweep).

Numbers cross-referenced against each template's README and main
script; reflects the templates as shipped (not the larger demos
they were sourced from). Prompts use domain-natural language with
no Concept.property syntax inside the prompt strings.
@cafzal cafzal changed the title Add reproducibility runbooks for four multi-reasoner templates Add hybrid runbooks (narrative + agent recipe) for five multi-reasoner templates May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-igiwizt5c-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/xvHLthb9T2REYHm5Vv9MFkgmTzZ1

Replace the prior verbose, mechanical prompts with short natural
questions a user would actually type, modeled on the reasoner-eval
QA catalog. Where an eval QA exists for the same skill+pattern, the
runbook prompt mirrors that question directly.

Also fix the predictive skill names: rai-predictive-modeling +
rai-predictive-training (the public skills are available; the
"no public skill yet" placeholder was stale).

Each prompt is now:
- 1-2 sentences asking what the user wants to know
- Domain-natural language
- Aligned to a real eval-style question

Affects all five runbooks: telco_network_recovery,
energy_grid_planning, supply_chain_resilience,
machine_maintenance, portfolio_balancing.
Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with
a single-line `> /rai-skill "question"` form that mirrors how a user
actually invokes a skill in chat. Multi-skill stages stay readable
as `> /rai-A + /rai-B "question"`.

One change per stage callout, all five runbooks. No content rewrite.
Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain
ASCII, a workflow table (skill + prompt + expected output per step),
and a brief data footer. All per-stage narrative subsections,
"how to read" / "adapting" / "why the chain matters" sections, and
duplicate enrichment diagrams are gone.

Total: 5 files, ~2000 lines removed, ~260 retained.
Bullet format gives each prompt its own line so users can triple-click
to select and copy. Expected output renders as a paragraph under each
bullet. No content changes, just structural.
Each step is now `### N. <topic>` followed by two bullets:
- Prompt: <skill> <question> (in code formatting, no quotes — easy
  to triple-click and copy as a single agent invocation)
- Response: <expected output>

Same content, clearer structure for skim + copy.
- telco: reorder to summit-demo workflow (descriptive -> rules ->
  graph -> predictive -> prescriptive -> interpret); fix WEST
  multiplier 0.993x -> 0.9998x; fix other-region growth range to
  +0.45-0.91%/day; clarify projected_demand_growth is written to
  all 250 towers via region join, not just 15
- energy: drop fabricated 36-month forecast horizon -> 24-month;
  rewrite Stage 3 low-carbon prompt to describe the actual rule
  (per-DC requirement vs zero-emission share) instead of a
  fabricated 25%/100% threshold
- machine_maintenance: drop unsupported "Turbines need on-site
  qualified technician" hard constraint -- script penalizes
  travel cost, doesn't enforce co-location; add parts_cost
  factor to failure cost formula
- portfolio: tighten crisis vol-gap range to actual 25-30%
  (peak +29.8% at eps_1, low +25.2% at eps_5)
- supply_chain: no factual changes (verified clean)
Prompts should describe what the user wants, not how the agent
should do it. The skill (with the agent) handles solver choice,
formula construction, and implementation details.

- telco Stage 5: drop the explicit Σ formula
- portfolio Stage 2: 'force the rest to zero' -> 'only invest in those'
- portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent
  scaffolding; show 7 frontier points instead
- supply chain Stage 3: rephrase as 'find the minimum-cost shipping
  plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers',
  'prefer non-bottleneck sites'
- machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical)
- machine_maintenance Stage 4: simplify cost-formula language
@cafzal cafzal changed the title Add hybrid runbooks (narrative + agent recipe) for five multi-reasoner templates Add reproducibility runbooks for five multi-reasoner templates May 6, 2026
Step 1 is now /rai-build-starter-ontology against the bundled CSVs.
Discovery, the chain stages, and interpretation shift to 2..N.
Reflects that users start with the demo data and need the ontology
materialized before any reasoner skill can run.
5-runbook audit against template scripts and READMEs.

Telco: fix concept list to the 9 the script defines (drop
Contract/BillingEvent/etc. that are not in the script, add
RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts
and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps
(BRONZE=3, GOLD=8).

Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain +
centrality (script computes all three); reword "structurally
constrained bottleneck" to clarify DFW is the binding capacity
bottleneck specifically.

Supply chain: fix concept list to actual 7 (drop StockKeepingUnit
/Inventory/BillOfMaterial that aren't Concepts, rename to SKU);
sharpen Step 2 discovery prompt + response to enumerate the 5
chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes
in Step 3.

Machine maintenance: fix concept list (drop TrainingOption — used
as DataFrame, not Concept; add CertificationExpiry); correct
x_assigned binary count from ~250 to 384 (96 qualified pairs x 4
periods).

Portfolio: drop fictitious StockPair Concept (script uses binary
property Stock.covar(Stock,Stock) instead); add Regime to the
Stage 5 Concept callout; enumerate the 6 actual constraint
families.

Step rename: '### 2. Discovery' -> '### 2. Discover reasoner
questions' across all five runbooks for clearer step labelling.
Runbooks live alongside the template script, README, and data/
directory now (was under references/). Updated relative paths
inside each runbook from ../data/ -> data/ and ../<template>.py ->
<template>.py.
Each chain now ends with /rai-ontology-design promoting the
per-stage enrichments into first-class ontology state and adding
new Concepts where a stage produced new entities (SelectedUpgrade,
InvestmentPortfolio, SupplyPlan, MaintenancePlan +
CrossTrainingRecommendation, FrontierPoint). The chain output
persists as queryable ontology rather than stage-local Python
state, which is what enables a downstream analyst to keep working
without re-running the chain.
Step 2 is now /rai-querying showing the concept-relationship diagram
and row counts so a user can confirm the ontology came out the way
they expected before any reasoner skill runs against it. Discovery
shifts to step 3, downstream chain steps shift by 1.
End-to-end run shows the script outputs 6 frontier points per
scenario (min-risk anchor + 5 epsilon sweep points). Max-return
is computed as a separate anchor for setting the rate range but
isn't included in the frontier table. Fix runbook accordingly:

- Step 6 prompt: drop 'Show 7 points', describe the actual sweep
- Step 6 response: '6-point frontier per scenario; 7 solves per
  scenario x 6 scenarios = 42 LOCALLY_SOLVED'
- Step 7 prompt: 'six-point Pareto frontier'
- Chain ASCII: '6-point frontier'
- Closing step response: FrontierPoint count = 36, not 42

All other portfolio numbers verified against the actual run.
The chain already writes per-stage enrichments back to the ontology
via model.define() in each reasoner stage, so promoting them is
redundant. The real gap is the prescriptive aggregates and post-solve
metadata that currently live only in pandas / stdout.

Per-template, the closing step now adds the specific Concepts that
materialize what the chain doesn't:

- telco: RestorePlan (singleton plan summary) + SelectedUpgrade
  (view-concept over the 15 chosen tower-tier rows)
- energy: InvestmentPortfolio(InvestmentLevel) holding per-budget
  totals + marginal_per_m + knee flag (5 rows)
- supply_chain: RoutingScenario (3 rows: Baseline, S004-offline,
  Watch-Avoid) with status, total_cost, cost_delta_pct, blocked
  businesses
- machine_maintenance: MaintenancePlan, TypeConcentration(machine_type)
  per-type concentration analysis, and CrossTrainingRecommendation
  with ranked candidates
- portfolio: FrontierPoint(Scenario, eps_label) — 36 rows holding
  return, risk, vol_base, vol_crisis, vol_gap_pct, is_knee

Also strips inner backticks from the prompt code span (which were
breaking the outer markdown code rendering) and restores the blank
line before the Data section.
Strip the meta-framing ('the chain already writes X, what's still
only in pandas...') from each /rai-ontology-design prompt. A user
wouldn't talk to an agent that way — they'd just say what they
want added to the ontology. The agent (with the ontology-design
skill loaded) figures out the gap.

Also drop 'support temporal GNN message passing downstream' tail
from the telco build prompt — replaced with the user-facing
reason ('we'll want to forecast region-level trends later').
Strip implementation details a user wouldn't type — those belong
to the agent + loaded skill, not the user's question:

- telco diagnose: drop DAILY_REVENUE_USD column reference
- telco rules: drop 'first derive averages from NetworkPerformance,
  via NetworkEquipment -> EquipmentHealth' join paths
- telco graph: drop 'rank by total PageRank influence' algorithm name
- telco predictive: drop 'GNN', 'TemporalEdge', 'message passing',
  'lag features (prev-day, prev-week, 7-day mean)' feature
  engineering, and 'Mean each region's Dec predictions, convert
  to 1+x multiplier, bind via region' implementation steps
- telco prescriptive: drop 'MIP scoped to options where X.for_tower(Y)
  AND Z.is_critical_restore()' join syntax, decision-variable typing
  ('binary, keyed by tower_id+tier'), and explicit 'sum(...)'
  objective formula
- energy graph: drop 'WCC, Louvain, betweenness/degree/eigenvector'
  algorithm enumeration
- supply chain rules: drop downstream-coupling explanation ('avoid
  hard-blocked downstream', 'watch surcharged') from the rule
  prompt — that's the optimizer's concern
- machine maintenance graph: 'Compute centrality' -> 'Score by how
  central in the qualification network'
- portfolio frontier: drop 'Anchor at min-risk and max-return,
  then sweep 5 epsilon points' agent-implementation; user just
  asks for '6 frontier points per scenario from min-risk through
  high-return'
- portfolio stress: drop 'shrink correlations toward all-ones with
  weight 0.7 on base covariance + 0.3 on outer-product' formula
  — user just says 'pushes correlations 70% of the way toward
  all-ones'
cafzal added 2 commits May 6, 2026 12:31
Subagents audited each prompt against (a) the named skill's
SKILL.md and (b) the template script to verify the agent + skill
+ ontology have enough business signal to land on the script's
behavior, without re-adding mechanics.

- portfolio Step 6: spell out the 3 budgets (500, 1000, 2000) and
  2 regimes (base, crisis), and call out the fully-invested
  constraint so the solver doesn't drop budget equality
- portfolio Step 8: fix a real numerical inversion — alpha=0.7
  means 30% shrinkage toward all-ones, not 70% (the prompt was
  saying the opposite of the script). Also add 're-solve the
  same frontier under crisis covariance' so the agent re-runs
  rather than just re-evaluating risk
- energy Step 7: add 'across all five levels in a single solve'
  + clarify Stage 6 compliance flags are informational, not a
  hard pre-filter (otherwise risked producing a degenerate
  2-DC frontier from filtering to only the compliant pair)
- supply_chain Step 6: ask the rules step to also flag
  HIGH-priority demand as escalated (was in the Response but
  missing from the Prompt)
- machine_maintenance Step 7: replace 'Schedule maintenance for
  all 30 machines' with 'maintained or left exposed' framing,
  and name the 5-jobs-per-period parts/bay cap and per-tech
  hours capacity that the script enforces — a literal 'must
  maintain all 30' read otherwise conflicts with the cap
- telco: no edits; subagent flagged 2 minor risks (TemporalEdge
  comes from inline-derived edges, not bare CSVs; Step 5 only
  names 2 of 4 derived health metrics) but both are
  non-load-bearing for the chain
5 subagents simulated Prompt -> agent (with skill loaded) -> output
for each step and compared to the canonical script.

Sharpens applied (runbook only):
- machine_maintenance Step 5: drop 'betweenness (24.0 raw, 1.0
  normalized)' from response — algorithm name + raw centrality
  numeric leak agent-level mechanics into a user-facing
  description. Now says 'top centrality (normalized to 1.0)'.
- machine_maintenance Step 7: drop the leaking decision-variable
  counts/typing ('120 x_maintain + 120 x_vulnerable + 384
  x_assigned binaries') and the 'failure cost = x_vulnerable x
  predicted_fp x parts_cost x criticality x (1 + 2 x betweenness)'
  formula from the response — those are implementation
  scaffolding. Kept the 5-constraint-family enumeration and the
  Stage 2 deadline handoff which are business-level facts.
- portfolio Step 5: tighten to 'Cluster stocks where absolute
  return correlation is at least 0.3 ... pick representative by
  highest Sharpe ratio and flag the rest as non-representatives'
  — drops the over-mechanical 'derive per-stock volatility and
  pairwise correlation from the covariance property' framing.
- portfolio Step 8: re-route the 'crisis-regime covariance
  derivation' work from /rai-prescriptive-solver-management to
  /rai-pyrel-coding (it's a derived ontology property, not
  solver lifecycle); /rai-prescriptive-results-interpretation
  still does the comparison.

No sharpens (prompts already sufficient):
- telco, energy, supply_chain — all Matches across the chain
  stages; left untouched.

Alignment gaps surfaced (script and runbook diverge — NOT fixed
per the no-script-edit rule, flagged for separate decision):

* All 5 templates: closing /rai-ontology-design step claims an
  ontology Concept that the script never materializes
  (RestorePlan, InvestmentPortfolio, RoutingScenario,
  MaintenancePlan/TypeConcentration/CrossTrainingRecommendation,
  FrontierPoint). The closing step is aspirational — what the
  agent would do AFTER the script's chain runs — but the
  Response's 'Ontology now carries X' phrasing implies the
  script did it.

* telco Step 9 (interpret): response narrates a sensitivity
  outcome ('flexing budget to $6M would promote TWR-0009
  BRONZE->GOLD') the script doesn't actually compute (single
  solve only).

* telco Step 1: response lists a TemporalEdge concept the
  prompt doesn't ask for; an agent following only the prompt
  + rai-build-starter-ontology would not produce it (script
  computes it via pandas elsewhere).

* energy Step 4: prompt invokes /rai-predictive-modeling +
  /rai-predictive-training, script does a CSV lookup with
  gnn.load() stub (no actual training).

* energy Step 5: prompt + skill imply single-algorithm
  centrality (skill explicitly forbids composite), script
  computes a composite-rank of betweenness + degree +
  eigenvector.

* supply_chain Step 5: centrality persisted via pandas
  round-trip rather than the canonical
  graph.Node.X = algorithm() shorthand.

* supply_chain Step 6: the 'avoid' tier is computed Python-side
  as a set intersection rather than as a RAI Relationship.
Each script now binds the post-solve aggregates onto the ontology so
the closing /rai-ontology-design step in the runbook is a real chain
deliverable, not aspirational:

- telco: RestorePlan singleton (total_cost, install_weeks, capacity,
  tier-mix counts, towers_covered, binding_constraint) + SelectedUpgrade
  unary Relationship narrowing TowerUpgradeOption to the chosen 15
- energy: InvestmentPortfolio(InvestmentLevel) — 5 rows (one per
  budget) with dc_count, total_mw, annual_revenue, upgrade_cost,
  net_value, marginal_per_m_to_next_level, is_knee_point
- supply_chain: RoutingScenario — 3 rows (Baseline, S004-offline,
  Watch-Avoid) with status, total_cost, cost_delta_pct,
  active_flow_count, unmet_total, blocked_businesses; also adds a
  Business.is_avoid Relationship that fires when both unreliable
  AND high_delay_risk fire (replaces a Python set intersection)
- machine_maintenance: MaintenancePlan singleton (objective +
  failure/labor/travel cost breakdown + total_jobs);
  TypeConcentration(machine_type) per machine type (qualified_tech_count,
  qualified_tech_locations, is_concentrated, scheduled_jobs_total,
  scheduled_jobs_traveling, travel_pct); CrossTrainingRecommendation
  per concentrated type (ranked candidates with is_best_candidate)
- portfolio: FrontierPoint(scenario_label, eps_label) — 36 rows
  (6 scenarios x 6 frontier points) with return, risk,
  marginal_risk_per_return, is_knee, vol_base, vol_crisis,
  vol_gap, vol_gap_pct

All five re-tested end-to-end against actual RAI; numbers match the
prior stdout output.

Runbook adapts (text only):
- telco Step 1: drop TemporalEdge from response (agent + skill won't
  produce a derived 1-day-lag edge concept from raw CSVs alone — it
  surfaces in Step 7 predictive)
- telco Step 9: soften the sensitivity narrative — agent identifies
  budget as binding and recommends a re-solve at $6M, rather than
  claiming the script computed the BRONZE->GOLD swap delta
- energy Step 4: prompt now allows the agent to use the pre-trained
  forecast in DemandForecast or train fresh
@cafzal cafzal merged commit daa7207 into telco-network-recovery-template May 6, 2026
2 checks passed
@cafzal cafzal deleted the worktree-runbooks-and-base-ontologies branch May 6, 2026 21:26
cafzal added a commit that referenced this pull request May 6, 2026
* Add telco_network_recovery template (multi-reasoner, private)

Four-stage chained-reasoner template on a shared telco ontology:
GNN regression on per-region daily KPIs (Predictive) → derived per-tower
averages + is_critical_restore flag (Rules) → PageRank on the call graph +
per-tower blast radius (Graph) → tower-upgrade MIP with three-factor
objective (Prescriptive). Each stage writes properties consumed by the next.

Bundled CSVs are dumped from TELCO_ENRICHMENT.PUBLIC; end-to-end run
produces $4,956,843 plan / 122 Gbps restored / 12 GOLD + 2 SILVER + 1 BRONZE
across all 15 critical WEST DEGRADED towers. Marked private: true to match
peer predictive templates.

* Regenerate v1/README.md index

* Add references/: full ontology + multi-reasoner runbook

Adds two reference files customers can adapt:
- telco_full_ontology.py — 18-concept reference ontology with model.Table()
  bindings (placeholder <YOUR_DB>.<YOUR_SCHEMA>) for users who want to scale
  the chain to their own Snowflake schema
- runbook.md — narrative walkthrough of all 5 reasoner stages plus the
  Stage 6 interpretation, including ASCII visualizations useful for
  explaining the chain to non-technical stakeholders

README links both from a new 'Further reading' section.

* Reframe runbook as recipe for (re)building the chain with agent skills

Drop telco_full_ontology.py — customer treats this as a worked example,
not a starter kit, so the broader Snowflake-source ontology added noise
without payoff.

Rewrite runbook.md from a narrative walkthrough into a build recipe
keyed to the RAI agent skills: rai-discovery scopes the question;
rai-pyrel-coding lays the focused 7-concept ontology; rai-rules-authoring
covers the derived averages + critical_restore flag; rai-graph-analysis
covers PageRank + blast radius; rai-prescriptive-problem-formulation
covers the MIP and pre-solver gates; rai-prescriptive-results-interpretation
covers the post-solve summary. Closes with how to adapt the chain to
a different domain.

* Drop runbook from PR 56 (PR 57 owns runbooks for all 4 multi-reasoner templates)

Plan for PR 57's hybrid build+present runbook authoring lives in
dev_temp/pr57_runbook_hybrid_plan.md (untracked, not shipped).

* Add reproducibility runbooks for five multi-reasoner templates (#57)

* Add reproducibility runbooks to four multi-reasoner templates

Each runbook is an agent prompt sequence to recreate the template's
multi-reasoner pipeline using the bundled CSVs in ../data/, mapping
each stage to the template's actual concepts, properties, and outputs:

- telco_network_recovery (5-stage: descriptive -> rules -> graph ->
  predictive -> prescriptive; mirrors PR #56's existing structure)
- energy_grid_planning (4-stage: predictive -> graph -> rules ->
  prescriptive with InvestmentLevel scenarios)
- supply_chain_resilience (4-stage: blast-radius -> graph -> rules ->
  min-cost flow + scenarios)
- machine_maintenance (5-stage: querying -> graph -> rules ->
  prescriptive maintenance schedule -> resilience cross-training)

Reproducible against the bundled template CSVs; one-line notes on
swapping to a Snowflake schema for users wiring to their own data.

* Hybrid runbook pass: discovery + per-stage skill/prompt blocks

Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md
to all five runbooks. Each runbook now serves both the stakeholder
(narrative + ASCII visualizations) and the practitioner (explicit
skill + prompt to recreate the stage):

- "How to read this runbook" preface explaining the dual purpose
- Step 0 discovery section using rai-discovery to scope sub-questions
  to reasoner families before any chain stage runs
- Skill / Prompt boxed callout immediately under every Stage heading
  (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*)
- "Adapting this recipe to a new domain" closing section

Adds portfolio_balancing/references/runbook.md (4 stages: rules ->
graph clustering -> bi-objective Markowitz frontier -> crisis-regime
stress test, all aligned to the template's actual 8-stock dataset
and epsilon-rate frontier sweep).

Numbers cross-referenced against each template's README and main
script; reflects the templates as shipped (not the larger demos
they were sourced from). Prompts use domain-natural language with
no Concept.property syntax inside the prompt strings.

* Rewrite runbook prompts in user-question style

Replace the prior verbose, mechanical prompts with short natural
questions a user would actually type, modeled on the reasoner-eval
QA catalog. Where an eval QA exists for the same skill+pattern, the
runbook prompt mirrors that question directly.

Also fix the predictive skill names: rai-predictive-modeling +
rai-predictive-training (the public skills are available; the
"no public skill yet" placeholder was stale).

Each prompt is now:
- 1-2 sentences asking what the user wants to know
- Domain-natural language
- Aligned to a real eval-style question

Affects all five runbooks: telco_network_recovery,
energy_grid_planning, supply_chain_resilience,
machine_maintenance, portfolio_balancing.

* Inline skill invocation in prompt callouts

Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with
a single-line `> /rai-skill "question"` form that mirrors how a user
actually invokes a skill in chat. Multi-skill stages stay readable
as `> /rai-A + /rai-B "question"`.

One change per stage callout, all five runbooks. No content rewrite.

* Trim runbooks to overview + workflow table + data footer

Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain
ASCII, a workflow table (skill + prompt + expected output per step),
and a brief data footer. All per-stage narrative subsections,
"how to read" / "adapting" / "why the chain matters" sections, and
duplicate enrichment diagrams are gone.

Total: 5 files, ~2000 lines removed, ~260 retained.

* Convert workflow tables to bullets for easier copy

Bullet format gives each prompt its own line so users can triple-click
to select and copy. Expected output renders as a paragraph under each
bullet. No content changes, just structural.

* Reformat workflow steps with descriptive headers

Each step is now `### N. <topic>` followed by two bullets:
- Prompt: <skill> <question> (in code formatting, no quotes — easy
  to triple-click and copy as a single agent invocation)
- Response: <expected output>

Same content, clearer structure for skim + copy.

* Audit-pass fixes against template scripts and READMEs

- telco: reorder to summit-demo workflow (descriptive -> rules ->
  graph -> predictive -> prescriptive -> interpret); fix WEST
  multiplier 0.993x -> 0.9998x; fix other-region growth range to
  +0.45-0.91%/day; clarify projected_demand_growth is written to
  all 250 towers via region join, not just 15
- energy: drop fabricated 36-month forecast horizon -> 24-month;
  rewrite Stage 3 low-carbon prompt to describe the actual rule
  (per-DC requirement vs zero-emission share) instead of a
  fabricated 25%/100% threshold
- machine_maintenance: drop unsupported "Turbines need on-site
  qualified technician" hard constraint -- script penalizes
  travel cost, doesn't enforce co-location; add parts_cost
  factor to failure cost formula
- portfolio: tighten crisis vol-gap range to actual 25-30%
  (peak +29.8% at eps_1, low +25.2% at eps_5)
- supply_chain: no factual changes (verified clean)

* Strip mechanical bits from prompts

Prompts should describe what the user wants, not how the agent
should do it. The skill (with the agent) handles solver choice,
formula construction, and implementation details.

- telco Stage 5: drop the explicit Σ formula
- portfolio Stage 2: 'force the rest to zero' -> 'only invest in those'
- portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent
  scaffolding; show 7 frontier points instead
- supply chain Stage 3: rephrase as 'find the minimum-cost shipping
  plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers',
  'prefer non-bottleneck sites'
- machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical)
- machine_maintenance Stage 4: simplify cost-formula language

* Add ontology-build step at the front of each runbook

Step 1 is now /rai-build-starter-ontology against the bundled CSVs.
Discovery, the chain stages, and interpretation shift to 2..N.
Reflects that users start with the demo data and need the ontology
materialized before any reasoner skill can run.

* Audit-pass corrections + rename Discovery step

5-runbook audit against template scripts and READMEs.

Telco: fix concept list to the 9 the script defines (drop
Contract/BillingEvent/etc. that are not in the script, add
RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts
and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps
(BRONZE=3, GOLD=8).

Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain +
centrality (script computes all three); reword "structurally
constrained bottleneck" to clarify DFW is the binding capacity
bottleneck specifically.

Supply chain: fix concept list to actual 7 (drop StockKeepingUnit
/Inventory/BillOfMaterial that aren't Concepts, rename to SKU);
sharpen Step 2 discovery prompt + response to enumerate the 5
chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes
in Step 3.

Machine maintenance: fix concept list (drop TrainingOption — used
as DataFrame, not Concept; add CertificationExpiry); correct
x_assigned binary count from ~250 to 384 (96 qualified pairs x 4
periods).

Portfolio: drop fictitious StockPair Concept (script uses binary
property Stock.covar(Stock,Stock) instead); add Regime to the
Stage 5 Concept callout; enumerate the 6 actual constraint
families.

Step rename: '### 2. Discovery' -> '### 2. Discover reasoner
questions' across all five runbooks for clearer step labelling.

* Move runbook.md to template root

Runbooks live alongside the template script, README, and data/
directory now (was under references/). Updated relative paths
inside each runbook from ../data/ -> data/ and ../<template>.py ->
<template>.py.

* Add closing rai-ontology-design step to each runbook

Each chain now ends with /rai-ontology-design promoting the
per-stage enrichments into first-class ontology state and adding
new Concepts where a stage produced new entities (SelectedUpgrade,
InvestmentPortfolio, SupplyPlan, MaintenancePlan +
CrossTrainingRecommendation, FrontierPoint). The chain output
persists as queryable ontology rather than stage-local Python
state, which is what enables a downstream analyst to keep working
without re-running the chain.

* Insert 'Examine ontology' step after Build ontology

Step 2 is now /rai-querying showing the concept-relationship diagram
and row counts so a user can confirm the ontology came out the way
they expected before any reasoner skill runs against it. Discovery
shifts to step 3, downstream chain steps shift by 1.

* Portfolio frontier is 6 points per scenario, not 7

End-to-end run shows the script outputs 6 frontier points per
scenario (min-risk anchor + 5 epsilon sweep points). Max-return
is computed as a separate anchor for setting the rate range but
isn't included in the frontier table. Fix runbook accordingly:

- Step 6 prompt: drop 'Show 7 points', describe the actual sweep
- Step 6 response: '6-point frontier per scenario; 7 solves per
  scenario x 6 scenarios = 42 LOCALLY_SOLVED'
- Step 7 prompt: 'six-point Pareto frontier'
- Chain ASCII: '6-point frontier'
- Closing step response: FrontierPoint count = 36, not 42

All other portfolio numbers verified against the actual run.

* Tighten closing rai-ontology-design step to cover the actual gap

The chain already writes per-stage enrichments back to the ontology
via model.define() in each reasoner stage, so promoting them is
redundant. The real gap is the prescriptive aggregates and post-solve
metadata that currently live only in pandas / stdout.

Per-template, the closing step now adds the specific Concepts that
materialize what the chain doesn't:

- telco: RestorePlan (singleton plan summary) + SelectedUpgrade
  (view-concept over the 15 chosen tower-tier rows)
- energy: InvestmentPortfolio(InvestmentLevel) holding per-budget
  totals + marginal_per_m + knee flag (5 rows)
- supply_chain: RoutingScenario (3 rows: Baseline, S004-offline,
  Watch-Avoid) with status, total_cost, cost_delta_pct, blocked
  businesses
- machine_maintenance: MaintenancePlan, TypeConcentration(machine_type)
  per-type concentration analysis, and CrossTrainingRecommendation
  with ranked candidates
- portfolio: FrontierPoint(Scenario, eps_label) — 36 rows holding
  return, risk, vol_base, vol_crisis, vol_gap_pct, is_knee

Also strips inner backticks from the prompt code span (which were
breaking the outer markdown code rendering) and restores the blank
line before the Data section.

* Rename closing step: 'Persist solution concepts into the ontology'

* Naturalize closing-step prompts

Strip the meta-framing ('the chain already writes X, what's still
only in pandas...') from each /rai-ontology-design prompt. A user
wouldn't talk to an agent that way — they'd just say what they
want added to the ontology. The agent (with the ontology-design
skill loaded) figures out the gap.

Also drop 'support temporal GNN message passing downstream' tail
from the telco build prompt — replaced with the user-facing
reason ('we'll want to forecast region-level trends later').

* Naturalize all prompts: drop algorithm/solver/feature mechanics

Strip implementation details a user wouldn't type — those belong
to the agent + loaded skill, not the user's question:

- telco diagnose: drop DAILY_REVENUE_USD column reference
- telco rules: drop 'first derive averages from NetworkPerformance,
  via NetworkEquipment -> EquipmentHealth' join paths
- telco graph: drop 'rank by total PageRank influence' algorithm name
- telco predictive: drop 'GNN', 'TemporalEdge', 'message passing',
  'lag features (prev-day, prev-week, 7-day mean)' feature
  engineering, and 'Mean each region's Dec predictions, convert
  to 1+x multiplier, bind via region' implementation steps
- telco prescriptive: drop 'MIP scoped to options where X.for_tower(Y)
  AND Z.is_critical_restore()' join syntax, decision-variable typing
  ('binary, keyed by tower_id+tier'), and explicit 'sum(...)'
  objective formula
- energy graph: drop 'WCC, Louvain, betweenness/degree/eigenvector'
  algorithm enumeration
- supply chain rules: drop downstream-coupling explanation ('avoid
  hard-blocked downstream', 'watch surcharged') from the rule
  prompt — that's the optimizer's concern
- machine maintenance graph: 'Compute centrality' -> 'Score by how
  central in the qualification network'
- portfolio frontier: drop 'Anchor at min-risk and max-return,
  then sweep 5 epsilon points' agent-implementation; user just
  asks for '6 frontier points per scenario from min-risk through
  high-return'
- portfolio stress: drop 'shrink correlations toward all-ones with
  weight 0.7 on base covariance + 0.3 on outer-product' formula
  — user just says 'pushes correlations 70% of the way toward
  all-ones'

* Prompt-sufficiency audit: minor sharpens for 4 of 5 templates

Subagents audited each prompt against (a) the named skill's
SKILL.md and (b) the template script to verify the agent + skill
+ ontology have enough business signal to land on the script's
behavior, without re-adding mechanics.

- portfolio Step 6: spell out the 3 budgets (500, 1000, 2000) and
  2 regimes (base, crisis), and call out the fully-invested
  constraint so the solver doesn't drop budget equality
- portfolio Step 8: fix a real numerical inversion — alpha=0.7
  means 30% shrinkage toward all-ones, not 70% (the prompt was
  saying the opposite of the script). Also add 're-solve the
  same frontier under crisis covariance' so the agent re-runs
  rather than just re-evaluating risk
- energy Step 7: add 'across all five levels in a single solve'
  + clarify Stage 6 compliance flags are informational, not a
  hard pre-filter (otherwise risked producing a degenerate
  2-DC frontier from filtering to only the compliant pair)
- supply_chain Step 6: ask the rules step to also flag
  HIGH-priority demand as escalated (was in the Response but
  missing from the Prompt)
- machine_maintenance Step 7: replace 'Schedule maintenance for
  all 30 machines' with 'maintained or left exposed' framing,
  and name the 5-jobs-per-period parts/bay cap and per-tech
  hours capacity that the script enforces — a literal 'must
  maintain all 30' read otherwise conflicts with the cap
- telco: no edits; subagent flagged 2 minor risks (TemporalEdge
  comes from inline-derived edges, not bare CSVs; Step 5 only
  names 2 of 4 derived health metrics) but both are
  non-load-bearing for the chain

* Prompt-path audit: minor sharpens to runbook only (script untouched)

5 subagents simulated Prompt -> agent (with skill loaded) -> output
for each step and compared to the canonical script.

Sharpens applied (runbook only):
- machine_maintenance Step 5: drop 'betweenness (24.0 raw, 1.0
  normalized)' from response — algorithm name + raw centrality
  numeric leak agent-level mechanics into a user-facing
  description. Now says 'top centrality (normalized to 1.0)'.
- machine_maintenance Step 7: drop the leaking decision-variable
  counts/typing ('120 x_maintain + 120 x_vulnerable + 384
  x_assigned binaries') and the 'failure cost = x_vulnerable x
  predicted_fp x parts_cost x criticality x (1 + 2 x betweenness)'
  formula from the response — those are implementation
  scaffolding. Kept the 5-constraint-family enumeration and the
  Stage 2 deadline handoff which are business-level facts.
- portfolio Step 5: tighten to 'Cluster stocks where absolute
  return correlation is at least 0.3 ... pick representative by
  highest Sharpe ratio and flag the rest as non-representatives'
  — drops the over-mechanical 'derive per-stock volatility and
  pairwise correlation from the covariance property' framing.
- portfolio Step 8: re-route the 'crisis-regime covariance
  derivation' work from /rai-prescriptive-solver-management to
  /rai-pyrel-coding (it's a derived ontology property, not
  solver lifecycle); /rai-prescriptive-results-interpretation
  still does the comparison.

No sharpens (prompts already sufficient):
- telco, energy, supply_chain — all Matches across the chain
  stages; left untouched.

Alignment gaps surfaced (script and runbook diverge — NOT fixed
per the no-script-edit rule, flagged for separate decision):

* All 5 templates: closing /rai-ontology-design step claims an
  ontology Concept that the script never materializes
  (RestorePlan, InvestmentPortfolio, RoutingScenario,
  MaintenancePlan/TypeConcentration/CrossTrainingRecommendation,
  FrontierPoint). The closing step is aspirational — what the
  agent would do AFTER the script's chain runs — but the
  Response's 'Ontology now carries X' phrasing implies the
  script did it.

* telco Step 9 (interpret): response narrates a sensitivity
  outcome ('flexing budget to $6M would promote TWR-0009
  BRONZE->GOLD') the script doesn't actually compute (single
  solve only).

* telco Step 1: response lists a TemporalEdge concept the
  prompt doesn't ask for; an agent following only the prompt
  + rai-build-starter-ontology would not produce it (script
  computes it via pandas elsewhere).

* energy Step 4: prompt invokes /rai-predictive-modeling +
  /rai-predictive-training, script does a CSV lookup with
  gnn.load() stub (no actual training).

* energy Step 5: prompt + skill imply single-algorithm
  centrality (skill explicitly forbids composite), script
  computes a composite-rank of betweenness + degree +
  eigenvector.

* supply_chain Step 5: centrality persisted via pandas
  round-trip rather than the canonical
  graph.Node.X = algorithm() shorthand.

* supply_chain Step 6: the 'avoid' tier is computed Python-side
  as a set intersection rather than as a RAI Relationship.

* Materialize closing-step solution Concepts in each script

Each script now binds the post-solve aggregates onto the ontology so
the closing /rai-ontology-design step in the runbook is a real chain
deliverable, not aspirational:

- telco: RestorePlan singleton (total_cost, install_weeks, capacity,
  tier-mix counts, towers_covered, binding_constraint) + SelectedUpgrade
  unary Relationship narrowing TowerUpgradeOption to the chosen 15
- energy: InvestmentPortfolio(InvestmentLevel) — 5 rows (one per
  budget) with dc_count, total_mw, annual_revenue, upgrade_cost,
  net_value, marginal_per_m_to_next_level, is_knee_point
- supply_chain: RoutingScenario — 3 rows (Baseline, S004-offline,
  Watch-Avoid) with status, total_cost, cost_delta_pct,
  active_flow_count, unmet_total, blocked_businesses; also adds a
  Business.is_avoid Relationship that fires when both unreliable
  AND high_delay_risk fire (replaces a Python set intersection)
- machine_maintenance: MaintenancePlan singleton (objective +
  failure/labor/travel cost breakdown + total_jobs);
  TypeConcentration(machine_type) per machine type (qualified_tech_count,
  qualified_tech_locations, is_concentrated, scheduled_jobs_total,
  scheduled_jobs_traveling, travel_pct); CrossTrainingRecommendation
  per concentrated type (ranked candidates with is_best_candidate)
- portfolio: FrontierPoint(scenario_label, eps_label) — 36 rows
  (6 scenarios x 6 frontier points) with return, risk,
  marginal_risk_per_return, is_knee, vol_base, vol_crisis,
  vol_gap, vol_gap_pct

All five re-tested end-to-end against actual RAI; numbers match the
prior stdout output.

Runbook adapts (text only):
- telco Step 1: drop TemporalEdge from response (agent + skill won't
  produce a derived 1-day-lag edge concept from raw CSVs alone — it
  surfaces in Step 7 predictive)
- telco Step 9: soften the sensitivity narrative — agent identifies
  budget as binding and recommends a re-solve at $6M, rather than
  claiming the script computed the BRONZE->GOLD swap delta
- energy Step 4: prompt now allows the agent to use the pre-trained
  forecast in DemandForecast or train fresh

* Address PR review: switch per-step format to fenced-code blocks

Use **Prompt** / **Response** headings + a fenced ``` code block
for the prompt content so each prompt gets GitHub's copy button.
The inline-bullet form rendered fine but couldn't be copied as a
single agent invocation in one click.

Reviewer feedback on PR 56 (somacdivad).

* Remove three accidentally-committed _sf.py files

These were untracked Snowflake-bound scripts for the predictive
templates that got picked up by 'git add -A' in the previous
commit. They aren't part of the runbook format change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant