Add reproducibility runbooks for five multi-reasoner templates#57
Open
cafzal wants to merge 21 commits intotelco-network-recovery-templatefrom
Open
Add reproducibility runbooks for five multi-reasoner templates#57cafzal wants to merge 21 commits intotelco-network-recovery-templatefrom
cafzal wants to merge 21 commits intotelco-network-recovery-templatefrom
Conversation
Each runbook is an agent prompt sequence to recreate the template's multi-reasoner pipeline using the bundled CSVs in ../data/, mapping each stage to the template's actual concepts, properties, and outputs: - telco_network_recovery (5-stage: descriptive -> rules -> graph -> predictive -> prescriptive; mirrors PR #56's existing structure) - energy_grid_planning (4-stage: predictive -> graph -> rules -> prescriptive with InvestmentLevel scenarios) - supply_chain_resilience (4-stage: blast-radius -> graph -> rules -> min-cost flow + scenarios) - machine_maintenance (5-stage: querying -> graph -> rules -> prescriptive maintenance schedule -> resilience cross-training) Reproducible against the bundled template CSVs; one-line notes on swapping to a Snowflake schema for users wiring to their own data.
4 tasks
Apply the dual-audience plan in dev_temp/pr57_runbook_hybrid_plan.md to all five runbooks. Each runbook now serves both the stakeholder (narrative + ASCII visualizations) and the practitioner (explicit skill + prompt to recreate the stage): - "How to read this runbook" preface explaining the dual purpose - Step 0 discovery section using rai-discovery to scope sub-questions to reasoner families before any chain stage runs - Skill / Prompt boxed callout immediately under every Stage heading (rai-querying, rai-rules-authoring, rai-graph-analysis, rai-prescriptive-*) - "Adapting this recipe to a new domain" closing section Adds portfolio_balancing/references/runbook.md (4 stages: rules -> graph clustering -> bi-objective Markowitz frontier -> crisis-regime stress test, all aligned to the template's actual 8-stock dataset and epsilon-rate frontier sweep). Numbers cross-referenced against each template's README and main script; reflects the templates as shipped (not the larger demos they were sourced from). Prompts use domain-natural language with no Concept.property syntax inside the prompt strings.
|
The docs preview for this pull request has been deployed to Vercel!
|
Replace the prior verbose, mechanical prompts with short natural questions a user would actually type, modeled on the reasoner-eval QA catalog. Where an eval QA exists for the same skill+pattern, the runbook prompt mirrors that question directly. Also fix the predictive skill names: rai-predictive-modeling + rai-predictive-training (the public skills are available; the "no public skill yet" placeholder was stale). Each prompt is now: - 1-2 sentences asking what the user wants to know - Domain-natural language - Aligned to a real eval-style question Affects all five runbooks: telco_network_recovery, energy_grid_planning, supply_chain_resilience, machine_maintenance, portfolio_balancing.
Replace the two-line `**Skill:** ... · **Prompt:** "..."` callout with a single-line `> /rai-skill "question"` form that mirrors how a user actually invokes a skill in chat. Multi-skill stages stay readable as `> /rai-A + /rai-B "question"`. One change per stage callout, all five runbooks. No content rewrite.
Each runbook is now ~50 lines: 1-paragraph intro, the TL;DR chain ASCII, a workflow table (skill + prompt + expected output per step), and a brief data footer. All per-stage narrative subsections, "how to read" / "adapting" / "why the chain matters" sections, and duplicate enrichment diagrams are gone. Total: 5 files, ~2000 lines removed, ~260 retained.
Bullet format gives each prompt its own line so users can triple-click to select and copy. Expected output renders as a paragraph under each bullet. No content changes, just structural.
Each step is now `### N. <topic>` followed by two bullets: - Prompt: <skill> <question> (in code formatting, no quotes — easy to triple-click and copy as a single agent invocation) - Response: <expected output> Same content, clearer structure for skim + copy.
- telco: reorder to summit-demo workflow (descriptive -> rules -> graph -> predictive -> prescriptive -> interpret); fix WEST multiplier 0.993x -> 0.9998x; fix other-region growth range to +0.45-0.91%/day; clarify projected_demand_growth is written to all 250 towers via region join, not just 15 - energy: drop fabricated 36-month forecast horizon -> 24-month; rewrite Stage 3 low-carbon prompt to describe the actual rule (per-DC requirement vs zero-emission share) instead of a fabricated 25%/100% threshold - machine_maintenance: drop unsupported "Turbines need on-site qualified technician" hard constraint -- script penalizes travel cost, doesn't enforce co-location; add parts_cost factor to failure cost formula - portfolio: tighten crisis vol-gap range to actual 25-30% (peak +29.8% at eps_1, low +25.2% at eps_5) - supply_chain: no factual changes (verified clean)
Prompts should describe what the user wants, not how the agent should do it. The skill (with the agent) handles solver choice, formula construction, and implementation details. - telco Stage 5: drop the explicit Σ formula - portfolio Stage 2: 'force the rest to zero' -> 'only invest in those' - portfolio Stage 3: drop 'anchor / sweep / forced to zero' agent scaffolding; show 7 frontier points instead - supply chain Stage 3: rephrase as 'find the minimum-cost shipping plan' (no 'Solve a ... LP'), 'don't ship from avoid suppliers', 'prefer non-bottleneck sites' - machine_maintenance Stage 5: drop 'Solve with HiGHS' (mechanical) - machine_maintenance Stage 4: simplify cost-formula language
Step 1 is now /rai-build-starter-ontology against the bundled CSVs. Discovery, the chain stages, and interpretation shift to 2..N. Reflects that users start with the demo data and need the ontology materialized before any reasoner skill can run.
5-runbook audit against template scripts and READMEs. Telco: fix concept list to the 9 the script defines (drop Contract/BillingEvent/etc. that are not in the script, add RegionMetric and TemporalEdge); sharpen Steps 2,3,4,6,7,8 prompts and responses; correct TWR-0009 BRONZE->GOLD delta to +5 Gbps (BRONZE=3, GOLD=8). Energy: sharpen Stage 4 graph prompt to ask for WCC + Louvain + centrality (script computes all three); reword "structurally constrained bottleneck" to clarify DFW is the binding capacity bottleneck specifically. Supply chain: fix concept list to actual 7 (drop StockKeepingUnit /Inventory/BillOfMaterial that aren't Concepts, rename to SKU); sharpen Step 2 discovery prompt + response to enumerate the 5 chained reasoning steps; name the 6 SUPPLIER-typed upstream nodes in Step 3. Machine maintenance: fix concept list (drop TrainingOption — used as DataFrame, not Concept; add CertificationExpiry); correct x_assigned binary count from ~250 to 384 (96 qualified pairs x 4 periods). Portfolio: drop fictitious StockPair Concept (script uses binary property Stock.covar(Stock,Stock) instead); add Regime to the Stage 5 Concept callout; enumerate the 6 actual constraint families. Step rename: '### 2. Discovery' -> '### 2. Discover reasoner questions' across all five runbooks for clearer step labelling.
Runbooks live alongside the template script, README, and data/ directory now (was under references/). Updated relative paths inside each runbook from ../data/ -> data/ and ../<template>.py -> <template>.py.
Each chain now ends with /rai-ontology-design promoting the per-stage enrichments into first-class ontology state and adding new Concepts where a stage produced new entities (SelectedUpgrade, InvestmentPortfolio, SupplyPlan, MaintenancePlan + CrossTrainingRecommendation, FrontierPoint). The chain output persists as queryable ontology rather than stage-local Python state, which is what enables a downstream analyst to keep working without re-running the chain.
Step 2 is now /rai-querying showing the concept-relationship diagram and row counts so a user can confirm the ontology came out the way they expected before any reasoner skill runs against it. Discovery shifts to step 3, downstream chain steps shift by 1.
End-to-end run shows the script outputs 6 frontier points per scenario (min-risk anchor + 5 epsilon sweep points). Max-return is computed as a separate anchor for setting the rate range but isn't included in the frontier table. Fix runbook accordingly: - Step 6 prompt: drop 'Show 7 points', describe the actual sweep - Step 6 response: '6-point frontier per scenario; 7 solves per scenario x 6 scenarios = 42 LOCALLY_SOLVED' - Step 7 prompt: 'six-point Pareto frontier' - Chain ASCII: '6-point frontier' - Closing step response: FrontierPoint count = 36, not 42 All other portfolio numbers verified against the actual run.
The chain already writes per-stage enrichments back to the ontology via model.define() in each reasoner stage, so promoting them is redundant. The real gap is the prescriptive aggregates and post-solve metadata that currently live only in pandas / stdout. Per-template, the closing step now adds the specific Concepts that materialize what the chain doesn't: - telco: RestorePlan (singleton plan summary) + SelectedUpgrade (view-concept over the 15 chosen tower-tier rows) - energy: InvestmentPortfolio(InvestmentLevel) holding per-budget totals + marginal_per_m + knee flag (5 rows) - supply_chain: RoutingScenario (3 rows: Baseline, S004-offline, Watch-Avoid) with status, total_cost, cost_delta_pct, blocked businesses - machine_maintenance: MaintenancePlan, TypeConcentration(machine_type) per-type concentration analysis, and CrossTrainingRecommendation with ranked candidates - portfolio: FrontierPoint(Scenario, eps_label) — 36 rows holding return, risk, vol_base, vol_crisis, vol_gap_pct, is_knee Also strips inner backticks from the prompt code span (which were breaking the outer markdown code rendering) and restores the blank line before the Data section.
Strip the meta-framing ('the chain already writes X, what's still
only in pandas...') from each /rai-ontology-design prompt. A user
wouldn't talk to an agent that way — they'd just say what they
want added to the ontology. The agent (with the ontology-design
skill loaded) figures out the gap.
Also drop 'support temporal GNN message passing downstream' tail
from the telco build prompt — replaced with the user-facing
reason ('we'll want to forecast region-level trends later').
Strip implementation details a user wouldn't type — those belong
to the agent + loaded skill, not the user's question:
- telco diagnose: drop DAILY_REVENUE_USD column reference
- telco rules: drop 'first derive averages from NetworkPerformance,
via NetworkEquipment -> EquipmentHealth' join paths
- telco graph: drop 'rank by total PageRank influence' algorithm name
- telco predictive: drop 'GNN', 'TemporalEdge', 'message passing',
'lag features (prev-day, prev-week, 7-day mean)' feature
engineering, and 'Mean each region's Dec predictions, convert
to 1+x multiplier, bind via region' implementation steps
- telco prescriptive: drop 'MIP scoped to options where X.for_tower(Y)
AND Z.is_critical_restore()' join syntax, decision-variable typing
('binary, keyed by tower_id+tier'), and explicit 'sum(...)'
objective formula
- energy graph: drop 'WCC, Louvain, betweenness/degree/eigenvector'
algorithm enumeration
- supply chain rules: drop downstream-coupling explanation ('avoid
hard-blocked downstream', 'watch surcharged') from the rule
prompt — that's the optimizer's concern
- machine maintenance graph: 'Compute centrality' -> 'Score by how
central in the qualification network'
- portfolio frontier: drop 'Anchor at min-risk and max-return,
then sweep 5 epsilon points' agent-implementation; user just
asks for '6 frontier points per scenario from min-risk through
high-return'
- portfolio stress: drop 'shrink correlations toward all-ones with
weight 0.7 on base covariance + 0.3 on outer-product' formula
— user just says 'pushes correlations 70% of the way toward
all-ones'
Subagents audited each prompt against (a) the named skill's SKILL.md and (b) the template script to verify the agent + skill + ontology have enough business signal to land on the script's behavior, without re-adding mechanics. - portfolio Step 6: spell out the 3 budgets (500, 1000, 2000) and 2 regimes (base, crisis), and call out the fully-invested constraint so the solver doesn't drop budget equality - portfolio Step 8: fix a real numerical inversion — alpha=0.7 means 30% shrinkage toward all-ones, not 70% (the prompt was saying the opposite of the script). Also add 're-solve the same frontier under crisis covariance' so the agent re-runs rather than just re-evaluating risk - energy Step 7: add 'across all five levels in a single solve' + clarify Stage 6 compliance flags are informational, not a hard pre-filter (otherwise risked producing a degenerate 2-DC frontier from filtering to only the compliant pair) - supply_chain Step 6: ask the rules step to also flag HIGH-priority demand as escalated (was in the Response but missing from the Prompt) - machine_maintenance Step 7: replace 'Schedule maintenance for all 30 machines' with 'maintained or left exposed' framing, and name the 5-jobs-per-period parts/bay cap and per-tech hours capacity that the script enforces — a literal 'must maintain all 30' read otherwise conflicts with the cap - telco: no edits; subagent flagged 2 minor risks (TemporalEdge comes from inline-derived edges, not bare CSVs; Step 5 only names 2 of 4 derived health metrics) but both are non-load-bearing for the chain
5 subagents simulated Prompt -> agent (with skill loaded) -> output
for each step and compared to the canonical script.
Sharpens applied (runbook only):
- machine_maintenance Step 5: drop 'betweenness (24.0 raw, 1.0
normalized)' from response — algorithm name + raw centrality
numeric leak agent-level mechanics into a user-facing
description. Now says 'top centrality (normalized to 1.0)'.
- machine_maintenance Step 7: drop the leaking decision-variable
counts/typing ('120 x_maintain + 120 x_vulnerable + 384
x_assigned binaries') and the 'failure cost = x_vulnerable x
predicted_fp x parts_cost x criticality x (1 + 2 x betweenness)'
formula from the response — those are implementation
scaffolding. Kept the 5-constraint-family enumeration and the
Stage 2 deadline handoff which are business-level facts.
- portfolio Step 5: tighten to 'Cluster stocks where absolute
return correlation is at least 0.3 ... pick representative by
highest Sharpe ratio and flag the rest as non-representatives'
— drops the over-mechanical 'derive per-stock volatility and
pairwise correlation from the covariance property' framing.
- portfolio Step 8: re-route the 'crisis-regime covariance
derivation' work from /rai-prescriptive-solver-management to
/rai-pyrel-coding (it's a derived ontology property, not
solver lifecycle); /rai-prescriptive-results-interpretation
still does the comparison.
No sharpens (prompts already sufficient):
- telco, energy, supply_chain — all Matches across the chain
stages; left untouched.
Alignment gaps surfaced (script and runbook diverge — NOT fixed
per the no-script-edit rule, flagged for separate decision):
* All 5 templates: closing /rai-ontology-design step claims an
ontology Concept that the script never materializes
(RestorePlan, InvestmentPortfolio, RoutingScenario,
MaintenancePlan/TypeConcentration/CrossTrainingRecommendation,
FrontierPoint). The closing step is aspirational — what the
agent would do AFTER the script's chain runs — but the
Response's 'Ontology now carries X' phrasing implies the
script did it.
* telco Step 9 (interpret): response narrates a sensitivity
outcome ('flexing budget to $6M would promote TWR-0009
BRONZE->GOLD') the script doesn't actually compute (single
solve only).
* telco Step 1: response lists a TemporalEdge concept the
prompt doesn't ask for; an agent following only the prompt
+ rai-build-starter-ontology would not produce it (script
computes it via pandas elsewhere).
* energy Step 4: prompt invokes /rai-predictive-modeling +
/rai-predictive-training, script does a CSV lookup with
gnn.load() stub (no actual training).
* energy Step 5: prompt + skill imply single-algorithm
centrality (skill explicitly forbids composite), script
computes a composite-rank of betweenness + degree +
eigenvector.
* supply_chain Step 5: centrality persisted via pandas
round-trip rather than the canonical
graph.Node.X = algorithm() shorthand.
* supply_chain Step 6: the 'avoid' tier is computed Python-side
as a set intersection rather than as a RAI Relationship.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
These templates already work end-to-end as scripts, but a user starting from scratch with an RAI agent has to reverse-engineer the chain — which skill to load, which question to ask, what to expect back — to recreate the workflow on their own data. This PR adds a one-screen runbook per template that turns each into a ready-to-paste agent recipe: skill + prompt + expected response per step, in chain order, starting from raw demo data. Goal: drop the time-to-first-success for someone starting with an agent + RAI skills against a multi-reasoner workflow.
Templates covered
energy_grid_planningsupply_chain_resiliencemachine_maintenanceportfolio_balancingtelco_network_recovery(lands with #56)Per-step format
Each step is a
### N. <topic>header followed by two bullets:Prompt:an inline skill invocation the user can copy and paste —/rai-skill <natural-language question>Response:what the agent produces (key numbers, properties written back to the ontology)The chain bookends are:
/rai-build-starter-ontology(Step 1) builds against the bundled CSVs indata/;/rai-querying(Step 2) examines the result;/rai-discovery(Step 3) scopes sub-questions to reasoner families; the chain stages do the work;/rai-ontology-design(final step) promotes per-stage enrichments to first-class ontology state and adds new Concepts where a stage produced new entities (SelectedUpgrade,InvestmentPortfolio,SupplyPlan,MaintenancePlan+CrossTrainingRecommendation,FrontierPoint).Skills used (all public):
rai-build-starter-ontology,rai-querying,rai-discovery,rai-rules-authoring,rai-graph-analysis,rai-predictive-modeling+rai-predictive-training,rai-prescriptive-problem-formulation,rai-prescriptive-solver-management,rai-prescriptive-results-interpretation,rai-ontology-design.Source-of-truth alignment
Numbers cross-referenced against each template's README and main script. Prompts describe the user's question (no solver names, no
Σformulas, no agent-implementation scaffolding). Telco workflow ordering and prompts mirror the summit-demo runbook so the two stay in sync as a single chain recipe.Test plan
Concept.propertynames referenced in Response bullets all exist in the scriptdata/CSVs (all 5 templates):energy_grid_planning✓ — knee $300M / 5 DCs / 1,500 MW / $264.35M, marginal $995K → $400K/$M, 3 Louvain communitiesmachine_maintenance✓ — OEE 79.8 / 68.2 / 61.4, 1 Critical (M013) / 1 Elevated (M016) / 28 Standard, OPTIMAL $605,240.61, T006 cross-train $3,200 / 5 weekssupply_chain_resilience✓ — centrality 0.5016 / 0.3895 / 0.3688 (max-normalized to 1.000 / 0.776 / 0.735 in runbook), 2 components, B017 avoid + B003 watch, baseline OPTIMAL $1,865 / 8 flows / 0 unmet, +88.5% S004-offline / +0.0% watch→avoidportfolio_balancing✓ — 4 holdings + 2 sectors flagged, 5 reps PFE/GOOGL/JPM/PG/XOM, frontier 32.43→40.28 / 1160→1742 with knee at eps_1, crisis vol gap +28.4% min_risk → +29.8% peak (eps_1, eps_2) → +25.2% eps_5. (One correction applied: frontier is 6 points per scenario, not 7 — runbook updated.)telco_network_recovery✓ (run with a predictive-enabled venv) — WEST multiplier 0.9998× (-0.0002 raw), other regions +0.45% (SOUTH) → +0.91% (NORTH), 15 critical_restore towers, TWR-0014 weighted_impact 0.0502 / 61 subs, OPTIMAL $4,956,843 / 164 install-weeks / 122 Gbps, 12 GOLD / 2 SILVER / 1 BRONZE, all 15 covered (TWR-0009 BRONZE, TWR-0005 + TWR-0006 SILVER, rest GOLD)