Add reproducibility runbooks for five multi-reasoner templates#57
Merged
cafzal merged 22 commits intotelco-network-recovery-templatefrom May 6, 2026
Merged
Conversation
Each runbook is an agent prompt sequence to recreate the template's multi-reasoner pipeline using the bundled CSVs in ../data/, mapping each stage to the template's actual concepts, properties, and outputs: - telco_network_recovery (5-stage: descriptive -> rules -> graph -> predictive -> prescriptive; mirrors PR #56's existing structure) - energy_grid_planning (4-stage: predictive -> graph -> rules -> prescriptive with InvestmentLevel scenarios) - supply_chain_resilience (4-stage: blast-radius -> graph -> rules -> min-cost flow + scenarios) - machine_maintenance (5-stage: querying -> graph -> rules -> prescriptive maintenance schedule -> resilience cross-training) Reproducible against the bundled template CSVs; one-line notes on swapping to a Snowflake schema for users wiring to their own data.
4 tasks
Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md to all five runbooks. Each runbook now serves both the stakeholder (narrative + ASCII visualizations) and the practitioner (explicit skill + prompt to recreate the stage): - "How to read this runbook" preface explaining the dual purpose - Step 0 discovery section using rai-discovery to scope sub-questions to reasoner families before any chain stage runs - Skill / Prompt boxed callout immediately under every Stage heading (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*) - "Adapting this recipe to a new domain" closing section Adds portfolio_balancing/references/runbook.md (4 stages: rules -> graph clustering -> bi-objective Markowitz frontier -> crisis-regime stress test, all aligned to the template's actual 8-stock dataset and epsilon-rate frontier sweep). Numbers cross-referenced against each template's README and main script; reflects the templates as shipped (not the larger demos they were sourced from). Prompts use domain-natural language with no Concept.property syntax inside the prompt strings.
|
The docs preview for this pull request has been deployed to Vercel!
|
Replace the prior verbose, mechanical prompts with short natural questions a user would actually type, modeled on the reasoner-eval QA catalog. Where an eval QA exists for the same skill+pattern, the runbook prompt mirrors that question directly. Also fix the predictive skill names: rai-predictive-modeling + rai-predictive-training (the public skills are available; the "no public skill yet" placeholder was stale). Each prompt is now: - 1-2 sentences asking what the user wants to know - Domain-natural language - Aligned to a real eval-style question Affects all five runbooks: telco_network_recovery, energy_grid_planning, supply_chain_resilience, machine_maintenance, portfolio_balancing.
Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with a single-line `> /rai-skill "question"` form that mirrors how a user actually invokes a skill in chat. Multi-skill stages stay readable as `> /rai-A + /rai-B "question"`. One change per stage callout, all five runbooks. No content rewrite.
Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain ASCII, a workflow table (skill + prompt + expected output per step), and a brief data footer. All per-stage narrative subsections, "how to read" / "adapting" / "why the chain matters" sections, and duplicate enrichment diagrams are gone. Total: 5 files, ~2000 lines removed, ~260 retained.
Bullet format gives each prompt its own line so users can triple-click to select and copy. Expected output renders as a paragraph under each bullet. No content changes, just structural.
Each step is now `### N. <topic>` followed by two bullets: - Prompt: <skill> <question> (in code formatting, no quotes — easy to triple-click and copy as a single agent invocation) - Response: <expected output> Same content, clearer structure for skim + copy.
- telco: reorder to summit-demo workflow (descriptive -> rules -> graph -> predictive -> prescriptive -> interpret); fix WEST multiplier 0.993x -> 0.9998x; fix other-region growth range to +0.45-0.91%/day; clarify projected_demand_growth is written to all 250 towers via region join, not just 15 - energy: drop fabricated 36-month forecast horizon -> 24-month; rewrite Stage 3 low-carbon prompt to describe the actual rule (per-DC requirement vs zero-emission share) instead of a fabricated 25%/100% threshold - machine_maintenance: drop unsupported "Turbines need on-site qualified technician" hard constraint -- script penalizes travel cost, doesn't enforce co-location; add parts_cost factor to failure cost formula - portfolio: tighten crisis vol-gap range to actual 25-30% (peak +29.8% at eps_1, low +25.2% at eps_5) - supply_chain: no factual changes (verified clean)
Prompts should describe what the user wants, not how the agent should do it. The skill (with the agent) handles solver choice, formula construction, and implementation details. - telco Stage 5: drop the explicit Σ formula - portfolio Stage 2: 'force the rest to zero' -> 'only invest in those' - portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent scaffolding; show 7 frontier points instead - supply chain Stage 3: rephrase as 'find the minimum-cost shipping plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers', 'prefer non-bottleneck sites' - machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical) - machine_maintenance Stage 4: simplify cost-formula language
Step 1 is now /rai-build-starter-ontology against the bundled CSVs. Discovery, the chain stages, and interpretation shift to 2..N. Reflects that users start with the demo data and need the ontology materialized before any reasoner skill can run.
5-runbook audit against template scripts and READMEs. Telco: fix concept list to the 9 the script defines (drop Contract/BillingEvent/etc. that are not in the script, add RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps (BRONZE=3, GOLD=8). Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain + centrality (script computes all three); reword "structurally constrained bottleneck" to clarify DFW is the binding capacity bottleneck specifically. Supply chain: fix concept list to actual 7 (drop StockKeepingUnit /Inventory/BillOfMaterial that aren't Concepts, rename to SKU); sharpen Step 2 discovery prompt + response to enumerate the 5 chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes in Step 3. Machine maintenance: fix concept list (drop TrainingOption — used as DataFrame, not Concept; add CertificationExpiry); correct x_assigned binary count from ~250 to 384 (96 qualified pairs x 4 periods). Portfolio: drop fictitious StockPair Concept (script uses binary property Stock.covar(Stock,Stock) instead); add Regime to the Stage 5 Concept callout; enumerate the 6 actual constraint families. Step rename: '### 2. Discovery' -> '### 2. Discover reasoner questions' across all five runbooks for clearer step labelling.
Runbooks live alongside the template script, README, and data/ directory now (was under references/). Updated relative paths inside each runbook from ../data/ -> data/ and ../<template>.py -> <template>.py.
Each chain now ends with /rai-ontology-design promoting the per-stage enrichments into first-class ontology state and adding new Concepts where a stage produced new entities (SelectedUpgrade, InvestmentPortfolio, SupplyPlan, MaintenancePlan + CrossTrainingRecommendation, FrontierPoint). The chain output persists as queryable ontology rather than stage-local Python state, which is what enables a downstream analyst to keep working without re-running the chain.
Step 2 is now /rai-querying showing the concept-relationship diagram and row counts so a user can confirm the ontology came out the way they expected before any reasoner skill runs against it. Discovery shifts to step 3, downstream chain steps shift by 1.
End-to-end run shows the script outputs 6 frontier points per scenario (min-risk anchor + 5 epsilon sweep points). Max-return is computed as a separate anchor for setting the rate range but isn't included in the frontier table. Fix runbook accordingly: - Step 6 prompt: drop 'Show 7 points', describe the actual sweep - Step 6 response: '6-point frontier per scenario; 7 solves per scenario x 6 scenarios = 42 LOCALLY_SOLVED' - Step 7 prompt: 'six-point Pareto frontier' - Chain ASCII: '6-point frontier' - Closing step response: FrontierPoint count = 36, not 42 All other portfolio numbers verified against the actual run.
The chain already writes per-stage enrichments back to the ontology via model.define() in each reasoner stage, so promoting them is redundant. The real gap is the prescriptive aggregates and post-solve metadata that currently live only in pandas / stdout. Per-template, the closing step now adds the specific Concepts that materialize what the chain doesn't: - telco: RestorePlan (singleton plan summary) + SelectedUpgrade (view-concept over the 15 chosen tower-tier rows) - energy: InvestmentPortfolio(InvestmentLevel) holding per-budget totals + marginal_per_m + knee flag (5 rows) - supply_chain: RoutingScenario (3 rows: Baseline, S004-offline, Watch-Avoid) with status, total_cost, cost_delta_pct, blocked businesses - machine_maintenance: MaintenancePlan, TypeConcentration(machine_type) per-type concentration analysis, and CrossTrainingRecommendation with ranked candidates - portfolio: FrontierPoint(Scenario, eps_label) — 36 rows holding return, risk, vol_base, vol_crisis, vol_gap_pct, is_knee Also strips inner backticks from the prompt code span (which were breaking the outer markdown code rendering) and restores the blank line before the Data section.
Strip the meta-framing ('the chain already writes X, what's still
only in pandas...') from each /rai-ontology-design prompt. A user
wouldn't talk to an agent that way — they'd just say what they
want added to the ontology. The agent (with the ontology-design
skill loaded) figures out the gap.
Also drop 'support temporal GNN message passing downstream' tail
from the telco build prompt — replaced with the user-facing
reason ('we'll want to forecast region-level trends later').
Strip implementation details a user wouldn't type — those belong
to the agent + loaded skill, not the user's question:
- telco diagnose: drop DAILY_REVENUE_USD column reference
- telco rules: drop 'first derive averages from NetworkPerformance,
via NetworkEquipment -> EquipmentHealth' join paths
- telco graph: drop 'rank by total PageRank influence' algorithm name
- telco predictive: drop 'GNN', 'TemporalEdge', 'message passing',
'lag features (prev-day, prev-week, 7-day mean)' feature
engineering, and 'Mean each region's Dec predictions, convert
to 1+x multiplier, bind via region' implementation steps
- telco prescriptive: drop 'MIP scoped to options where X.for_tower(Y)
AND Z.is_critical_restore()' join syntax, decision-variable typing
('binary, keyed by tower_id+tier'), and explicit 'sum(...)'
objective formula
- energy graph: drop 'WCC, Louvain, betweenness/degree/eigenvector'
algorithm enumeration
- supply chain rules: drop downstream-coupling explanation ('avoid
hard-blocked downstream', 'watch surcharged') from the rule
prompt — that's the optimizer's concern
- machine maintenance graph: 'Compute centrality' -> 'Score by how
central in the qualification network'
- portfolio frontier: drop 'Anchor at min-risk and max-return,
then sweep 5 epsilon points' agent-implementation; user just
asks for '6 frontier points per scenario from min-risk through
high-return'
- portfolio stress: drop 'shrink correlations toward all-ones with
weight 0.7 on base covariance + 0.3 on outer-product' formula
— user just says 'pushes correlations 70% of the way toward
all-ones'
Subagents audited each prompt against (a) the named skill's SKILL.md and (b) the template script to verify the agent + skill + ontology have enough business signal to land on the script's behavior, without re-adding mechanics. - portfolio Step 6: spell out the 3 budgets (500, 1000, 2000) and 2 regimes (base, crisis), and call out the fully-invested constraint so the solver doesn't drop budget equality - portfolio Step 8: fix a real numerical inversion — alpha=0.7 means 30% shrinkage toward all-ones, not 70% (the prompt was saying the opposite of the script). Also add 're-solve the same frontier under crisis covariance' so the agent re-runs rather than just re-evaluating risk - energy Step 7: add 'across all five levels in a single solve' + clarify Stage 6 compliance flags are informational, not a hard pre-filter (otherwise risked producing a degenerate 2-DC frontier from filtering to only the compliant pair) - supply_chain Step 6: ask the rules step to also flag HIGH-priority demand as escalated (was in the Response but missing from the Prompt) - machine_maintenance Step 7: replace 'Schedule maintenance for all 30 machines' with 'maintained or left exposed' framing, and name the 5-jobs-per-period parts/bay cap and per-tech hours capacity that the script enforces — a literal 'must maintain all 30' read otherwise conflicts with the cap - telco: no edits; subagent flagged 2 minor risks (TemporalEdge comes from inline-derived edges, not bare CSVs; Step 5 only names 2 of 4 derived health metrics) but both are non-load-bearing for the chain
5 subagents simulated Prompt -> agent (with skill loaded) -> output
for each step and compared to the canonical script.
Sharpens applied (runbook only):
- machine_maintenance Step 5: drop 'betweenness (24.0 raw, 1.0
normalized)' from response — algorithm name + raw centrality
numeric leak agent-level mechanics into a user-facing
description. Now says 'top centrality (normalized to 1.0)'.
- machine_maintenance Step 7: drop the leaking decision-variable
counts/typing ('120 x_maintain + 120 x_vulnerable + 384
x_assigned binaries') and the 'failure cost = x_vulnerable x
predicted_fp x parts_cost x criticality x (1 + 2 x betweenness)'
formula from the response — those are implementation
scaffolding. Kept the 5-constraint-family enumeration and the
Stage 2 deadline handoff which are business-level facts.
- portfolio Step 5: tighten to 'Cluster stocks where absolute
return correlation is at least 0.3 ... pick representative by
highest Sharpe ratio and flag the rest as non-representatives'
— drops the over-mechanical 'derive per-stock volatility and
pairwise correlation from the covariance property' framing.
- portfolio Step 8: re-route the 'crisis-regime covariance
derivation' work from /rai-prescriptive-solver-management to
/rai-pyrel-coding (it's a derived ontology property, not
solver lifecycle); /rai-prescriptive-results-interpretation
still does the comparison.
No sharpens (prompts already sufficient):
- telco, energy, supply_chain — all Matches across the chain
stages; left untouched.
Alignment gaps surfaced (script and runbook diverge — NOT fixed
per the no-script-edit rule, flagged for separate decision):
* All 5 templates: closing /rai-ontology-design step claims an
ontology Concept that the script never materializes
(RestorePlan, InvestmentPortfolio, RoutingScenario,
MaintenancePlan/TypeConcentration/CrossTrainingRecommendation,
FrontierPoint). The closing step is aspirational — what the
agent would do AFTER the script's chain runs — but the
Response's 'Ontology now carries X' phrasing implies the
script did it.
* telco Step 9 (interpret): response narrates a sensitivity
outcome ('flexing budget to $6M would promote TWR-0009
BRONZE->GOLD') the script doesn't actually compute (single
solve only).
* telco Step 1: response lists a TemporalEdge concept the
prompt doesn't ask for; an agent following only the prompt
+ rai-build-starter-ontology would not produce it (script
computes it via pandas elsewhere).
* energy Step 4: prompt invokes /rai-predictive-modeling +
/rai-predictive-training, script does a CSV lookup with
gnn.load() stub (no actual training).
* energy Step 5: prompt + skill imply single-algorithm
centrality (skill explicitly forbids composite), script
computes a composite-rank of betweenness + degree +
eigenvector.
* supply_chain Step 5: centrality persisted via pandas
round-trip rather than the canonical
graph.Node.X = algorithm() shorthand.
* supply_chain Step 6: the 'avoid' tier is computed Python-side
as a set intersection rather than as a RAI Relationship.
Each script now binds the post-solve aggregates onto the ontology so the closing /rai-ontology-design step in the runbook is a real chain deliverable, not aspirational: - telco: RestorePlan singleton (total_cost, install_weeks, capacity, tier-mix counts, towers_covered, binding_constraint) + SelectedUpgrade unary Relationship narrowing TowerUpgradeOption to the chosen 15 - energy: InvestmentPortfolio(InvestmentLevel) — 5 rows (one per budget) with dc_count, total_mw, annual_revenue, upgrade_cost, net_value, marginal_per_m_to_next_level, is_knee_point - supply_chain: RoutingScenario — 3 rows (Baseline, S004-offline, Watch-Avoid) with status, total_cost, cost_delta_pct, active_flow_count, unmet_total, blocked_businesses; also adds a Business.is_avoid Relationship that fires when both unreliable AND high_delay_risk fire (replaces a Python set intersection) - machine_maintenance: MaintenancePlan singleton (objective + failure/labor/travel cost breakdown + total_jobs); TypeConcentration(machine_type) per machine type (qualified_tech_count, qualified_tech_locations, is_concentrated, scheduled_jobs_total, scheduled_jobs_traveling, travel_pct); CrossTrainingRecommendation per concentrated type (ranked candidates with is_best_candidate) - portfolio: FrontierPoint(scenario_label, eps_label) — 36 rows (6 scenarios x 6 frontier points) with return, risk, marginal_risk_per_return, is_knee, vol_base, vol_crisis, vol_gap, vol_gap_pct All five re-tested end-to-end against actual RAI; numbers match the prior stdout output. Runbook adapts (text only): - telco Step 1: drop TemporalEdge from response (agent + skill won't produce a derived 1-day-lag edge concept from raw CSVs alone — it surfaces in Step 7 predictive) - telco Step 9: soften the sensitivity narrative — agent identifies budget as binding and recommends a re-solve at $6M, rather than claiming the script computed the BRONZE->GOLD swap delta - energy Step 4: prompt now allows the agent to use the pre-trained forecast in DemandForecast or train fresh
cafzal
added a commit
that referenced
this pull request
May 6, 2026
* Add telco_network_recovery template (multi-reasoner, private) Four-stage chained-reasoner template on a shared telco ontology: GNN regression on per-region daily KPIs (Predictive) → derived per-tower averages + is_critical_restore flag (Rules) → PageRank on the call graph + per-tower blast radius (Graph) → tower-upgrade MIP with three-factor objective (Prescriptive). Each stage writes properties consumed by the next. Bundled CSVs are dumped from TELCO_ENRICHMENT.PUBLIC; end-to-end run produces $4,956,843 plan / 122 Gbps restored / 12 GOLD + 2 SILVER + 1 BRONZE across all 15 critical WEST DEGRADED towers. Marked private: true to match peer predictive templates. * Regenerate v1/README.md index * Add references/: full ontology + multi-reasoner runbook Adds two reference files customers can adapt: - telco_full_ontology.py — 18-concept reference ontology with model.Table() bindings (placeholder <YOUR_DB>.<YOUR_SCHEMA>) for users who want to scale the chain to their own Snowflake schema - runbook.md — narrative walkthrough of all 5 reasoner stages plus the Stage 6 interpretation, including ASCII visualizations useful for explaining the chain to non-technical stakeholders README links both from a new 'Further reading' section. * Reframe runbook as recipe for (re)building the chain with agent skills Drop telco_full_ontology.py — customer treats this as a worked example, not a starter kit, so the broader Snowflake-source ontology added noise without payoff. Rewrite runbook.md from a narrative walkthrough into a build recipe keyed to the RAI agent skills: rai-discovery scopes the question; rai-pyrel-coding lays the focused 7-concept ontology; rai-rules-authoring covers the derived averages + critical_restore flag; rai-graph-analysis covers PageRank + blast radius; rai-prescriptive-problem-formulation covers the MIP and pre-solver gates; rai-prescriptive-results-interpretation covers the post-solve summary. Closes with how to adapt the chain to a different domain. * Drop runbook from PR 56 (PR 57 owns runbooks for all 4 multi-reasoner templates) Plan for PR 57's hybrid build+present runbook authoring lives in dev_temp/pr57_runbook_hybrid_plan.md (untracked, not shipped). * Add reproducibility runbooks for five multi-reasoner templates (#57) * Add reproducibility runbooks to four multi-reasoner templates Each runbook is an agent prompt sequence to recreate the template's multi-reasoner pipeline using the bundled CSVs in ../data/, mapping each stage to the template's actual concepts, properties, and outputs: - telco_network_recovery (5-stage: descriptive -> rules -> graph -> predictive -> prescriptive; mirrors PR #56's existing structure) - energy_grid_planning (4-stage: predictive -> graph -> rules -> prescriptive with InvestmentLevel scenarios) - supply_chain_resilience (4-stage: blast-radius -> graph -> rules -> min-cost flow + scenarios) - machine_maintenance (5-stage: querying -> graph -> rules -> prescriptive maintenance schedule -> resilience cross-training) Reproducible against the bundled template CSVs; one-line notes on swapping to a Snowflake schema for users wiring to their own data. * Hybrid runbook pass: discovery + per-stage skill/prompt blocks Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md to all five runbooks. Each runbook now serves both the stakeholder (narrative + ASCII visualizations) and the practitioner (explicit skill + prompt to recreate the stage): - "How to read this runbook" preface explaining the dual purpose - Step 0 discovery section using rai-discovery to scope sub-questions to reasoner families before any chain stage runs - Skill / Prompt boxed callout immediately under every Stage heading (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*) - "Adapting this recipe to a new domain" closing section Adds portfolio_balancing/references/runbook.md (4 stages: rules -> graph clustering -> bi-objective Markowitz frontier -> crisis-regime stress test, all aligned to the template's actual 8-stock dataset and epsilon-rate frontier sweep). Numbers cross-referenced against each template's README and main script; reflects the templates as shipped (not the larger demos they were sourced from). Prompts use domain-natural language with no Concept.property syntax inside the prompt strings. * Rewrite runbook prompts in user-question style Replace the prior verbose, mechanical prompts with short natural questions a user would actually type, modeled on the reasoner-eval QA catalog. Where an eval QA exists for the same skill+pattern, the runbook prompt mirrors that question directly. Also fix the predictive skill names: rai-predictive-modeling + rai-predictive-training (the public skills are available; the "no public skill yet" placeholder was stale). Each prompt is now: - 1-2 sentences asking what the user wants to know - Domain-natural language - Aligned to a real eval-style question Affects all five runbooks: telco_network_recovery, energy_grid_planning, supply_chain_resilience, machine_maintenance, portfolio_balancing. * Inline skill invocation in prompt callouts Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with a single-line `> /rai-skill "question"` form that mirrors how a user actually invokes a skill in chat. Multi-skill stages stay readable as `> /rai-A + /rai-B "question"`. One change per stage callout, all five runbooks. No content rewrite. * Trim runbooks to overview + workflow table + data footer Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain ASCII, a workflow table (skill + prompt + expected output per step), and a brief data footer. All per-stage narrative subsections, "how to read" / "adapting" / "why the chain matters" sections, and duplicate enrichment diagrams are gone. Total: 5 files, ~2000 lines removed, ~260 retained. * Convert workflow tables to bullets for easier copy Bullet format gives each prompt its own line so users can triple-click to select and copy. Expected output renders as a paragraph under each bullet. No content changes, just structural. * Reformat workflow steps with descriptive headers Each step is now `### N. <topic>` followed by two bullets: - Prompt: <skill> <question> (in code formatting, no quotes — easy to triple-click and copy as a single agent invocation) - Response: <expected output> Same content, clearer structure for skim + copy. * Audit-pass fixes against template scripts and READMEs - telco: reorder to summit-demo workflow (descriptive -> rules -> graph -> predictive -> prescriptive -> interpret); fix WEST multiplier 0.993x -> 0.9998x; fix other-region growth range to +0.45-0.91%/day; clarify projected_demand_growth is written to all 250 towers via region join, not just 15 - energy: drop fabricated 36-month forecast horizon -> 24-month; rewrite Stage 3 low-carbon prompt to describe the actual rule (per-DC requirement vs zero-emission share) instead of a fabricated 25%/100% threshold - machine_maintenance: drop unsupported "Turbines need on-site qualified technician" hard constraint -- script penalizes travel cost, doesn't enforce co-location; add parts_cost factor to failure cost formula - portfolio: tighten crisis vol-gap range to actual 25-30% (peak +29.8% at eps_1, low +25.2% at eps_5) - supply_chain: no factual changes (verified clean) * Strip mechanical bits from prompts Prompts should describe what the user wants, not how the agent should do it. The skill (with the agent) handles solver choice, formula construction, and implementation details. - telco Stage 5: drop the explicit Σ formula - portfolio Stage 2: 'force the rest to zero' -> 'only invest in those' - portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent scaffolding; show 7 frontier points instead - supply chain Stage 3: rephrase as 'find the minimum-cost shipping plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers', 'prefer non-bottleneck sites' - machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical) - machine_maintenance Stage 4: simplify cost-formula language * Add ontology-build step at the front of each runbook Step 1 is now /rai-build-starter-ontology against the bundled CSVs. Discovery, the chain stages, and interpretation shift to 2..N. Reflects that users start with the demo data and need the ontology materialized before any reasoner skill can run. * Audit-pass corrections + rename Discovery step 5-runbook audit against template scripts and READMEs. Telco: fix concept list to the 9 the script defines (drop Contract/BillingEvent/etc. that are not in the script, add RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps (BRONZE=3, GOLD=8). Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain + centrality (script computes all three); reword "structurally constrained bottleneck" to clarify DFW is the binding capacity bottleneck specifically. Supply chain: fix concept list to actual 7 (drop StockKeepingUnit /Inventory/BillOfMaterial that aren't Concepts, rename to SKU); sharpen Step 2 discovery prompt + response to enumerate the 5 chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes in Step 3. Machine maintenance: fix concept list (drop TrainingOption — used as DataFrame, not Concept; add CertificationExpiry); correct x_assigned binary count from ~250 to 384 (96 qualified pairs x 4 periods). Portfolio: drop fictitious StockPair Concept (script uses binary property Stock.covar(Stock,Stock) instead); add Regime to the Stage 5 Concept callout; enumerate the 6 actual constraint families. Step rename: '### 2. Discovery' -> '### 2. Discover reasoner questions' across all five runbooks for clearer step labelling. * Move runbook.md to template root Runbooks live alongside the template script, README, and data/ directory now (was under references/). Updated relative paths inside each runbook from ../data/ -> data/ and ../<template>.py -> <template>.py. * Add closing rai-ontology-design step to each runbook Each chain now ends with /rai-ontology-design promoting the per-stage enrichments into first-class ontology state and adding new Concepts where a stage produced new entities (SelectedUpgrade, InvestmentPortfolio, SupplyPlan, MaintenancePlan + CrossTrainingRecommendation, FrontierPoint). The chain output persists as queryable ontology rather than stage-local Python state, which is what enables a downstream analyst to keep working without re-running the chain. * Insert 'Examine ontology' step after Build ontology Step 2 is now /rai-querying showing the concept-relationship diagram and row counts so a user can confirm the ontology came out the way they expected before any reasoner skill runs against it. Discovery shifts to step 3, downstream chain steps shift by 1. * Portfolio frontier is 6 points per scenario, not 7 End-to-end run shows the script outputs 6 frontier points per scenario (min-risk anchor + 5 epsilon sweep points). Max-return is computed as a separate anchor for setting the rate range but isn't included in the frontier table. Fix runbook accordingly: - Step 6 prompt: drop 'Show 7 points', describe the actual sweep - Step 6 response: '6-point frontier per scenario; 7 solves per scenario x 6 scenarios = 42 LOCALLY_SOLVED' - Step 7 prompt: 'six-point Pareto frontier' - Chain ASCII: '6-point frontier' - Closing step response: FrontierPoint count = 36, not 42 All other portfolio numbers verified against the actual run. * Tighten closing rai-ontology-design step to cover the actual gap The chain already writes per-stage enrichments back to the ontology via model.define() in each reasoner stage, so promoting them is redundant. The real gap is the prescriptive aggregates and post-solve metadata that currently live only in pandas / stdout. Per-template, the closing step now adds the specific Concepts that materialize what the chain doesn't: - telco: RestorePlan (singleton plan summary) + SelectedUpgrade (view-concept over the 15 chosen tower-tier rows) - energy: InvestmentPortfolio(InvestmentLevel) holding per-budget totals + marginal_per_m + knee flag (5 rows) - supply_chain: RoutingScenario (3 rows: Baseline, S004-offline, Watch-Avoid) with status, total_cost, cost_delta_pct, blocked businesses - machine_maintenance: MaintenancePlan, TypeConcentration(machine_type) per-type concentration analysis, and CrossTrainingRecommendation with ranked candidates - portfolio: FrontierPoint(Scenario, eps_label) — 36 rows holding return, risk, vol_base, vol_crisis, vol_gap_pct, is_knee Also strips inner backticks from the prompt code span (which were breaking the outer markdown code rendering) and restores the blank line before the Data section. * Rename closing step: 'Persist solution concepts into the ontology' * Naturalize closing-step prompts Strip the meta-framing ('the chain already writes X, what's still only in pandas...') from each /rai-ontology-design prompt. A user wouldn't talk to an agent that way — they'd just say what they want added to the ontology. The agent (with the ontology-design skill loaded) figures out the gap. Also drop 'support temporal GNN message passing downstream' tail from the telco build prompt — replaced with the user-facing reason ('we'll want to forecast region-level trends later'). * Naturalize all prompts: drop algorithm/solver/feature mechanics Strip implementation details a user wouldn't type — those belong to the agent + loaded skill, not the user's question: - telco diagnose: drop DAILY_REVENUE_USD column reference - telco rules: drop 'first derive averages from NetworkPerformance, via NetworkEquipment -> EquipmentHealth' join paths - telco graph: drop 'rank by total PageRank influence' algorithm name - telco predictive: drop 'GNN', 'TemporalEdge', 'message passing', 'lag features (prev-day, prev-week, 7-day mean)' feature engineering, and 'Mean each region's Dec predictions, convert to 1+x multiplier, bind via region' implementation steps - telco prescriptive: drop 'MIP scoped to options where X.for_tower(Y) AND Z.is_critical_restore()' join syntax, decision-variable typing ('binary, keyed by tower_id+tier'), and explicit 'sum(...)' objective formula - energy graph: drop 'WCC, Louvain, betweenness/degree/eigenvector' algorithm enumeration - supply chain rules: drop downstream-coupling explanation ('avoid hard-blocked downstream', 'watch surcharged') from the rule prompt — that's the optimizer's concern - machine maintenance graph: 'Compute centrality' -> 'Score by how central in the qualification network' - portfolio frontier: drop 'Anchor at min-risk and max-return, then sweep 5 epsilon points' agent-implementation; user just asks for '6 frontier points per scenario from min-risk through high-return' - portfolio stress: drop 'shrink correlations toward all-ones with weight 0.7 on base covariance + 0.3 on outer-product' formula — user just says 'pushes correlations 70% of the way toward all-ones' * Prompt-sufficiency audit: minor sharpens for 4 of 5 templates Subagents audited each prompt against (a) the named skill's SKILL.md and (b) the template script to verify the agent + skill + ontology have enough business signal to land on the script's behavior, without re-adding mechanics. - portfolio Step 6: spell out the 3 budgets (500, 1000, 2000) and 2 regimes (base, crisis), and call out the fully-invested constraint so the solver doesn't drop budget equality - portfolio Step 8: fix a real numerical inversion — alpha=0.7 means 30% shrinkage toward all-ones, not 70% (the prompt was saying the opposite of the script). Also add 're-solve the same frontier under crisis covariance' so the agent re-runs rather than just re-evaluating risk - energy Step 7: add 'across all five levels in a single solve' + clarify Stage 6 compliance flags are informational, not a hard pre-filter (otherwise risked producing a degenerate 2-DC frontier from filtering to only the compliant pair) - supply_chain Step 6: ask the rules step to also flag HIGH-priority demand as escalated (was in the Response but missing from the Prompt) - machine_maintenance Step 7: replace 'Schedule maintenance for all 30 machines' with 'maintained or left exposed' framing, and name the 5-jobs-per-period parts/bay cap and per-tech hours capacity that the script enforces — a literal 'must maintain all 30' read otherwise conflicts with the cap - telco: no edits; subagent flagged 2 minor risks (TemporalEdge comes from inline-derived edges, not bare CSVs; Step 5 only names 2 of 4 derived health metrics) but both are non-load-bearing for the chain * Prompt-path audit: minor sharpens to runbook only (script untouched) 5 subagents simulated Prompt -> agent (with skill loaded) -> output for each step and compared to the canonical script. Sharpens applied (runbook only): - machine_maintenance Step 5: drop 'betweenness (24.0 raw, 1.0 normalized)' from response — algorithm name + raw centrality numeric leak agent-level mechanics into a user-facing description. Now says 'top centrality (normalized to 1.0)'. - machine_maintenance Step 7: drop the leaking decision-variable counts/typing ('120 x_maintain + 120 x_vulnerable + 384 x_assigned binaries') and the 'failure cost = x_vulnerable x predicted_fp x parts_cost x criticality x (1 + 2 x betweenness)' formula from the response — those are implementation scaffolding. Kept the 5-constraint-family enumeration and the Stage 2 deadline handoff which are business-level facts. - portfolio Step 5: tighten to 'Cluster stocks where absolute return correlation is at least 0.3 ... pick representative by highest Sharpe ratio and flag the rest as non-representatives' — drops the over-mechanical 'derive per-stock volatility and pairwise correlation from the covariance property' framing. - portfolio Step 8: re-route the 'crisis-regime covariance derivation' work from /rai-prescriptive-solver-management to /rai-pyrel-coding (it's a derived ontology property, not solver lifecycle); /rai-prescriptive-results-interpretation still does the comparison. No sharpens (prompts already sufficient): - telco, energy, supply_chain — all Matches across the chain stages; left untouched. Alignment gaps surfaced (script and runbook diverge — NOT fixed per the no-script-edit rule, flagged for separate decision): * All 5 templates: closing /rai-ontology-design step claims an ontology Concept that the script never materializes (RestorePlan, InvestmentPortfolio, RoutingScenario, MaintenancePlan/TypeConcentration/CrossTrainingRecommendation, FrontierPoint). The closing step is aspirational — what the agent would do AFTER the script's chain runs — but the Response's 'Ontology now carries X' phrasing implies the script did it. * telco Step 9 (interpret): response narrates a sensitivity outcome ('flexing budget to $6M would promote TWR-0009 BRONZE->GOLD') the script doesn't actually compute (single solve only). * telco Step 1: response lists a TemporalEdge concept the prompt doesn't ask for; an agent following only the prompt + rai-build-starter-ontology would not produce it (script computes it via pandas elsewhere). * energy Step 4: prompt invokes /rai-predictive-modeling + /rai-predictive-training, script does a CSV lookup with gnn.load() stub (no actual training). * energy Step 5: prompt + skill imply single-algorithm centrality (skill explicitly forbids composite), script computes a composite-rank of betweenness + degree + eigenvector. * supply_chain Step 5: centrality persisted via pandas round-trip rather than the canonical graph.Node.X = algorithm() shorthand. * supply_chain Step 6: the 'avoid' tier is computed Python-side as a set intersection rather than as a RAI Relationship. * Materialize closing-step solution Concepts in each script Each script now binds the post-solve aggregates onto the ontology so the closing /rai-ontology-design step in the runbook is a real chain deliverable, not aspirational: - telco: RestorePlan singleton (total_cost, install_weeks, capacity, tier-mix counts, towers_covered, binding_constraint) + SelectedUpgrade unary Relationship narrowing TowerUpgradeOption to the chosen 15 - energy: InvestmentPortfolio(InvestmentLevel) — 5 rows (one per budget) with dc_count, total_mw, annual_revenue, upgrade_cost, net_value, marginal_per_m_to_next_level, is_knee_point - supply_chain: RoutingScenario — 3 rows (Baseline, S004-offline, Watch-Avoid) with status, total_cost, cost_delta_pct, active_flow_count, unmet_total, blocked_businesses; also adds a Business.is_avoid Relationship that fires when both unreliable AND high_delay_risk fire (replaces a Python set intersection) - machine_maintenance: MaintenancePlan singleton (objective + failure/labor/travel cost breakdown + total_jobs); TypeConcentration(machine_type) per machine type (qualified_tech_count, qualified_tech_locations, is_concentrated, scheduled_jobs_total, scheduled_jobs_traveling, travel_pct); CrossTrainingRecommendation per concentrated type (ranked candidates with is_best_candidate) - portfolio: FrontierPoint(scenario_label, eps_label) — 36 rows (6 scenarios x 6 frontier points) with return, risk, marginal_risk_per_return, is_knee, vol_base, vol_crisis, vol_gap, vol_gap_pct All five re-tested end-to-end against actual RAI; numbers match the prior stdout output. Runbook adapts (text only): - telco Step 1: drop TemporalEdge from response (agent + skill won't produce a derived 1-day-lag edge concept from raw CSVs alone — it surfaces in Step 7 predictive) - telco Step 9: soften the sensitivity narrative — agent identifies budget as binding and recommends a re-solve at $6M, rather than claiming the script computed the BRONZE->GOLD swap delta - energy Step 4: prompt now allows the agent to use the pre-trained forecast in DemandForecast or train fresh * Address PR review: switch per-step format to fenced-code blocks Use **Prompt** / **Response** headings + a fenced ``` code block for the prompt content so each prompt gets GitHub's copy button. The inline-bullet form rendered fine but couldn't be copied as a single agent invocation in one click. Reviewer feedback on PR 56 (somacdivad). * Remove three accidentally-committed _sf.py files These were untracked Snowflake-bound scripts for the predictive templates that got picked up by 'git add -A' in the previous commit. They aren't part of the runbook format change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
These templates already work end-to-end as scripts, but a user starting from scratch with an RAI agent has to reverse-engineer the chain — which skill to load, which question to ask, what to expect back — to recreate the workflow on their own data. This PR adds a one-screen runbook per template that turns each into a ready-to-paste agent recipe: skill + prompt + expected response per step, in chain order, starting from raw demo data. Each template script also now materializes its post-solve aggregates as queryable ontology Concepts so the closing
/rai-ontology-designstep is a real deliverable, not aspirational.Templates covered
energy_grid_planningsupply_chain_resiliencemachine_maintenanceportfolio_balancingtelco_network_recovery(lands with #56)Per-step format
Each step is a
### N. <topic>header followed by two bullets:Prompt:an inline skill invocation the user can copy and paste —/rai-skill <natural-language question>Response:what the agent produces (key numbers, properties written back to the ontology)The chain bookends are:
/rai-build-starter-ontology(Step 1) builds against the bundled CSVs indata/;/rai-querying(Step 2) examines the result;/rai-discovery(Step 3) scopes sub-questions to reasoner families; the chain stages do the work;/rai-ontology-design(final step) materializes the prescriptive aggregates the chain doesn't already write back as queryable Concepts.Script edits
Each template's main script now persists its closing-step Concepts so the chain output lives in the ontology, not in pandas:
RestorePlansingleton (total_cost, install_weeks, capacity, tier mix, towers_covered, binding_constraint) +SelectedUpgradeview restricted to the 15 chosen tower-tier rowsInvestmentPortfolio(InvestmentLevel)with per-budget aggregates (dc_count, total_mw, annual_revenue, upgrade_cost, net_value, marginal_per_m_to_next_level, is_knee_point)RoutingScenariowith per-scenario totals (status, total_cost, cost_delta_pct, active_flow_count, unmet_total, blocked_businesses); also adds aBusiness.is_avoidRelationship in place of the prior Python set intersectionMaintenancePlan(cost breakdown + total_jobs),TypeConcentration(machine_type)(per-type qualification spread + travel %),CrossTrainingRecommendation(ranked candidates withis_best_candidate)FrontierPoint(scenario_label, eps_label)with per-Pareto-point metadata (return, risk, marginal_risk_per_return, is_knee, vol_base, vol_crisis, vol_gap, vol_gap_pct)All five re-tested end-to-end against the actual RAI product; numbers match the prior stdout output.
Test plan
data/CSVs (telco needs a predictive-enabled venv)model.select(...)after the chain runsdata/in order, confirm output matches Response