Skip to content

feat: add Simulation.from_situation for situation-JSON input#52

Open
vahid-ahmadi wants to merge 1 commit into
mainfrom
vahid/from-situation
Open

feat: add Simulation.from_situation for situation-JSON input#52
vahid-ahmadi wants to merge 1 commit into
mainfrom
vahid/from-situation

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Contributor

Summary

Addresses the smallest, lowest-risk slice of #51: a single classmethod, Simulation.from_situation(situation, year), that accepts a PolicyEngine web-app situation-JSON dict and returns a fully-built Simulation. Pure Python wrapper change — no Rust modifications.

sim = Simulation.from_situation(
    {
        "people":     {"you": {"age": 30, "employment_income": {"2025": 50_000}}},
        "benunits":   {"yours": {"members": ["you"]}},
        "households": {"yours": {"members": ["you"], "region": "LONDON"}},
    },
    year=2025,
)
result = sim.run()

What's included

  • Simulation.from_situation(situation, year) classmethod
  • _situation_to_dataframes helper that handles:
    • period-keyed values ({"2025": 50000}) and plain scalars
    • members: lists wired up to integer person_ids and ;-separated person_ids / benunit_ids strings
    • implicit benunit when omitted (folds all people into one)
    • implicit is_benunit_head / is_household_head (first member of each entity, unless the situation set it explicitly)
    • region normalisation: accepts both "LONDON" (web-app form) and "London" (wrapper form), normalises to the form src/data/clean.rs::parse_region expects, and propagates is_in_scotland from the household region
    • lowercased gender
  • 22 pytest tests covering period resolution, members wiring, region normalisation, gender, multi-benunit households, validation errors, and parity with the existing single_person/couple constructors
  • CI step that runs the new Python tests (pytest interfaces/python/tests)
  • Changelog fragment under changelog.d/added/

Verified locally

  • cargo test: 135 passing
  • pytest interfaces/python/tests: 22 passing
  • End-to-end smoke tests against the release binary:
    • Single person £50k: from_situationsingle_person produces identical income tax (£7,486), NI (£2,994), net income (£39,520)
    • Couple + child: identical to couple() constructor for net income (£55,194) and child benefit (£1,355)

Scope notes

This PR addresses point 1 of issue #51's proposal (situation-JSON entry point). Points 2–4 (build_from_dataframe shorthand, build_from_url, full PE-canonical-name aliasing for variables that diverge from the wrapper's column schema) remain as follow-ups. Variable names in the situation dict map to the wrapper's existing input columns (those listed in PERSON_DEFAULTS / BENUNIT_DEFAULTS / HOUSEHOLD_DEFAULTS), with explicit normalisation only for region and gender.

Test plan

  • CI passes (cargo test + pytest interfaces/python/tests)
  • Local end-to-end run: from_situationSimulation.run_microdata() produces sensible numbers for hypothetical households
  • Parity check: from_situation matches single_person and couple() output to the penny

🤖 Generated with Claude Code

Adds a classmethod to the Python wrapper that accepts the PolicyEngine
web-app situation-JSON format (people / benunits / households with
`members` lists and period-keyed values) and converts it into the three
input DataFrames the Rust engine consumes.

Closes #51 in part — the small, low-risk piece. Datasets-from-URL and a
direct dataframe entry point can follow in subsequent PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@nikhilwoodruff nikhilwoodruff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am lean no on this. PE-UK originally has had users and AI agents confused by multiple possible input schemas, and I think a single, well documented and enforced point of entry would avoid mistakes.

vahid-ahmadi added a commit that referenced this pull request May 29, 2026
Adds `scripts/parity.py`, which runs a fixed set of synthetic households
through both the Python `policyengine-uk` package and the Rust
`policyengine_uk_compiled` wrapper, diffs key tax / benefit / net-income
outputs cell-for-cell, and prints a summary. Skips Python comparison
gracefully when the Python package isn't installed.

Wired into CI as a non-failing smoke step so it surfaces drift on every PR
without breaking on the divergences that already exist (currently up to
£3,276 on couple-with-children scenarios). Tolerance can be tightened
once those gaps close.

Stacked on top of #52 (Simulation.from_situation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vahid-ahmadi added a commit that referenced this pull request May 29, 2026
Adds `policyengine_uk_compiled.yaml_tests` — a runner that mirrors the
format used by `policyengine_uk/tests/policy/` so cases can be ported one
at a time.

The runner accepts either single-person flat input
(`input: { employment_income: 50000 }`) or full-situation input
(`input: { people: ..., benunits: ..., households: ... }`), supports
absolute and relative error margins, and writes outputs against the Rust
microdata column names (`baseline_income_tax`, `baseline_universal_credit`,
`baseline_net_income`, etc.).

This PR ships:
- The runner module with CLI: `python -m policyengine_uk_compiled.yaml_tests tests/policy`
- 11 hand-written YAML cases under `tests/policy/` covering income tax,
  employee NI, and Child Benefit (single + multi-person)
- A pytest module that auto-discovers and parametrizes the YAML cases
- 21 unit tests for the runner itself (input mapping, tolerance, parsing)
- pyyaml added to the package's runtime dependencies

Stacked on #53 (parity harness) which is itself stacked on #52
(Simulation.from_situation). Future PRs port more of the 196 Python YAML
tests that already exist in `policyengine_uk/tests/policy/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vahid-ahmadi added a commit that referenced this pull request May 29, 2026
Adds `policyengine_uk_compiled.yaml_tests` — a runner that mirrors the
format used by `policyengine_uk/tests/policy/` so cases can be ported one
at a time.

The runner accepts either single-person flat input
(`input: { employment_income: 50000 }`) or full-situation input
(`input: { people: ..., benunits: ..., households: ... }`), supports
absolute and relative error margins, and writes outputs against the Rust
microdata column names (`baseline_income_tax`, `baseline_universal_credit`,
`baseline_net_income`, etc.).

This PR ships:
- The runner module with CLI: `python -m policyengine_uk_compiled.yaml_tests tests/policy`
- 11 hand-written YAML cases under `tests/policy/` covering income tax,
  employee NI, and Child Benefit (single + multi-person)
- A pytest module that auto-discovers and parametrizes the YAML cases
- 21 unit tests for the runner itself (input mapping, tolerance, parsing)
- pyyaml added to the package's runtime dependencies

Stacked on #53 (parity harness) which is itself stacked on #52
(Simulation.from_situation). Future PRs port more of the 196 Python YAML
tests that already exist in `policyengine_uk/tests/policy/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Contributor Author

Rebased onto main (mergeable). Holding on @nikhilwoodruff's design objection — needs a maintainer decision before merge.

The code itself is clean and well-tested (22 hermetic tests; correct entity/id wiring, head-flag logic, region/gender normalisation). The blocker is the design point Nikhil raised: this adds a second public input schema alongside single_person/couple/raw-DataFrame, and the preference is a single, enforced entry point.

Worth noting that the original consumers no longer need this: the parity harness (#53) was reworked to compare FRS microdata directly, and the YAML harness (#54) now inlines the situation→DataFrame conversion as a private helper. So nothing currently depends on the public from_situation classmethod.

Given that, I'd suggest one of:

  • Close this PR (the conversion logic now lives privately in feat: add YAML policy-test harness #54), or
  • Repurpose it into a single documented/enforced entry point, routing the existing constructors through it rather than adding alongside them.

Happy to take whichever direction you prefer — flagging for your call rather than merging as-is.

@vahid-ahmadi
Copy link
Copy Markdown
Contributor Author

Re-verified on latest main (rebased, mergeable):

  • Tests: 22/22 pytest pass.
  • Correctness: from_situation matches single_person to the penny — £7,486 income tax / £2,994 employee NI / £39,520 net income for the £50k single-adult case.
  • Runtime: the situation→DataFrame conversion is ~1.1 ms/call warm (it's pandas DataFrame construction, not the mapping logic); run() itself is subprocess-dominated. Negligible for the single-situation use this targets.

Perf and correctness aren't the open question, though — @nikhilwoodruff's design point is, and it's the blocker. Confirming the dependency picture so the call is clean: nothing depends on the public from_situation anymore#53's parity harness compares FRS microdata directly, and #54 inlines the situation→DataFrame conversion as a private helper (verified across both branches). So neither resolution loses functionality:

  • (a) Close — the conversion already lives privately where it's used (feat: add YAML policy-test harness #54); or
  • (b) Repurpose into the single enforced entry point, routing single_person / couple through it so there's one documented schema rather than two.

@nikhilwoodruff — which would you prefer? Happy to do the (b) refactor or close for (a); flagging for your call rather than merging as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants