feat: add Simulation.from_situation for situation-JSON input#52
feat: add Simulation.from_situation for situation-JSON input#52vahid-ahmadi wants to merge 1 commit into
Conversation
Adds a classmethod to the Python wrapper that accepts the PolicyEngine web-app situation-JSON format (people / benunits / households with `members` lists and period-keyed values) and converts it into the three input DataFrames the Rust engine consumes. Closes #51 in part — the small, low-risk piece. Datasets-from-URL and a direct dataframe entry point can follow in subsequent PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nikhilwoodruff
left a comment
There was a problem hiding this comment.
I am lean no on this. PE-UK originally has had users and AI agents confused by multiple possible input schemas, and I think a single, well documented and enforced point of entry would avoid mistakes.
Adds `scripts/parity.py`, which runs a fixed set of synthetic households through both the Python `policyengine-uk` package and the Rust `policyengine_uk_compiled` wrapper, diffs key tax / benefit / net-income outputs cell-for-cell, and prints a summary. Skips Python comparison gracefully when the Python package isn't installed. Wired into CI as a non-failing smoke step so it surfaces drift on every PR without breaking on the divergences that already exist (currently up to £3,276 on couple-with-children scenarios). Tolerance can be tightened once those gaps close. Stacked on top of #52 (Simulation.from_situation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `policyengine_uk_compiled.yaml_tests` — a runner that mirrors the
format used by `policyengine_uk/tests/policy/` so cases can be ported one
at a time.
The runner accepts either single-person flat input
(`input: { employment_income: 50000 }`) or full-situation input
(`input: { people: ..., benunits: ..., households: ... }`), supports
absolute and relative error margins, and writes outputs against the Rust
microdata column names (`baseline_income_tax`, `baseline_universal_credit`,
`baseline_net_income`, etc.).
This PR ships:
- The runner module with CLI: `python -m policyengine_uk_compiled.yaml_tests tests/policy`
- 11 hand-written YAML cases under `tests/policy/` covering income tax,
employee NI, and Child Benefit (single + multi-person)
- A pytest module that auto-discovers and parametrizes the YAML cases
- 21 unit tests for the runner itself (input mapping, tolerance, parsing)
- pyyaml added to the package's runtime dependencies
Stacked on #53 (parity harness) which is itself stacked on #52
(Simulation.from_situation). Future PRs port more of the 196 Python YAML
tests that already exist in `policyengine_uk/tests/policy/`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `policyengine_uk_compiled.yaml_tests` — a runner that mirrors the
format used by `policyengine_uk/tests/policy/` so cases can be ported one
at a time.
The runner accepts either single-person flat input
(`input: { employment_income: 50000 }`) or full-situation input
(`input: { people: ..., benunits: ..., households: ... }`), supports
absolute and relative error margins, and writes outputs against the Rust
microdata column names (`baseline_income_tax`, `baseline_universal_credit`,
`baseline_net_income`, etc.).
This PR ships:
- The runner module with CLI: `python -m policyengine_uk_compiled.yaml_tests tests/policy`
- 11 hand-written YAML cases under `tests/policy/` covering income tax,
employee NI, and Child Benefit (single + multi-person)
- A pytest module that auto-discovers and parametrizes the YAML cases
- 21 unit tests for the runner itself (input mapping, tolerance, parsing)
- pyyaml added to the package's runtime dependencies
Stacked on #53 (parity harness) which is itself stacked on #52
(Simulation.from_situation). Future PRs port more of the 196 Python YAML
tests that already exist in `policyengine_uk/tests/policy/`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Rebased onto The code itself is clean and well-tested (22 hermetic tests; correct entity/id wiring, head-flag logic, region/gender normalisation). The blocker is the design point Nikhil raised: this adds a second public input schema alongside Worth noting that the original consumers no longer need this: the parity harness (#53) was reworked to compare FRS microdata directly, and the YAML harness (#54) now inlines the situation→DataFrame conversion as a private helper. So nothing currently depends on the public Given that, I'd suggest one of:
Happy to take whichever direction you prefer — flagging for your call rather than merging as-is. |
|
Re-verified on latest main (rebased, mergeable):
Perf and correctness aren't the open question, though — @nikhilwoodruff's design point is, and it's the blocker. Confirming the dependency picture so the call is clean: nothing depends on the public
@nikhilwoodruff — which would you prefer? Happy to do the (b) refactor or close for (a); flagging for your call rather than merging as-is. |
Summary
Addresses the smallest, lowest-risk slice of #51: a single classmethod,
Simulation.from_situation(situation, year), that accepts a PolicyEngine web-app situation-JSON dict and returns a fully-builtSimulation. Pure Python wrapper change — no Rust modifications.What's included
Simulation.from_situation(situation, year)classmethod_situation_to_dataframeshelper that handles:{"2025": 50000}) and plain scalarsmembers:lists wired up to integerperson_ids and;-separatedperson_ids/benunit_idsstringsis_benunit_head/is_household_head(first member of each entity, unless the situation set it explicitly)"LONDON"(web-app form) and"London"(wrapper form), normalises to the formsrc/data/clean.rs::parse_regionexpects, and propagatesis_in_scotlandfrom the household regiongendersingle_person/coupleconstructorspytest interfaces/python/tests)changelog.d/added/Verified locally
cargo test: 135 passingpytest interfaces/python/tests: 22 passingfrom_situation→single_personproduces identical income tax (£7,486), NI (£2,994), net income (£39,520)couple()constructor for net income (£55,194) and child benefit (£1,355)Scope notes
This PR addresses point 1 of issue #51's proposal (situation-JSON entry point). Points 2–4 (
build_from_dataframeshorthand,build_from_url, full PE-canonical-name aliasing for variables that diverge from the wrapper's column schema) remain as follow-ups. Variable names in the situation dict map to the wrapper's existing input columns (those listed inPERSON_DEFAULTS/BENUNIT_DEFAULTS/HOUSEHOLD_DEFAULTS), with explicit normalisation only forregionandgender.Test plan
cargo test+pytest interfaces/python/tests)from_situation→Simulation.run_microdata()produces sensible numbers for hypothetical householdsfrom_situationmatchessingle_personandcouple()output to the penny🤖 Generated with Claude Code