Skip to content

fix: parity harness compares against hbai_household_net_income#61

Open
vahid-ahmadi wants to merge 9 commits into
vahid/state-pension-recordedfrom
vahid/parity-harness-hbai
Open

fix: parity harness compares against hbai_household_net_income#61
vahid-ahmadi wants to merge 9 commits into
vahid/state-pension-recordedfrom
vahid/parity-harness-hbai

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Contributor

Summary

Quick fix for the parity harness from PR #53. Rust's baseline_net_income is the HBAI net-income definition (gross minus direct taxes plus benefits — excluding council tax, TV licence, transaction taxes). The harness was comparing it against Python's broader household_net_income, which subtracts those extras on top.

Net effect: every single scenario showed a £159 diff that was exclusively the TV licence (£174.50 × ~0.911 take-up). That noise masked the real, smaller divergences.

Before / after

                                        BEFORE       AFTER
single_£0 / £12k / … / £150k     diff   £159         £1.20
couple_no_kids_40k_25k           diff   £159         £2.40
couple_2kids_30k_15k             diff   £3,276       £2.40   ← biggest "fix"
lone_parent_2kids_18k            diff   £2,722       £554
pensioner_couple                 diff   £905         £200
scotland_single_45k              diff   £159         £1.20

The headline "couple_2kids £3,276 UC gap" claim from the original parity-harness PR was an artefact of this measurement bug — that scenario now shows £2.40, well within tolerance.

Real gaps remaining

After this and #60 (state-pension recorded amount), the meaningful parity divergences are:

scenario diff likely cause
lone_parent_2kids_18k £554 Real UC entitlement gap (Python > Rust)
pensioner_couple £200 Winter Fuel Allowance (Python includes, Rust doesn't yet)
everyone else £1–2 employer-NI £-1 rounding

Both are concrete next-PR candidates.

What's included

  • One-line variable swap in scripts/parity.py: household_net_incomehbai_household_net_income
  • Updated test assertion in interfaces/python/tests/test_parity_harness.py
  • Changelog fragment under changelog.d/fixed/

Verified locally

  • cargo test: 165 passing (unchanged)
  • pytest interfaces/python/tests: 87 passing
  • python -m policyengine_uk_compiled.yaml_tests tests/policy: 29/29
  • Parity harness: 5 scenarios within tolerance (vs 0 before), 3 real gaps (was 11)

Stacking

vahid/parity-harness-hbaivahid/state-pension-recorded (#60) ← vahid/council-tax-spd (#58) ← vahid/dla-aa-from-flags (#57) ← vahid/pip-from-flags (#56) ← vahid/lbtt-ltt (#55) ← vahid/yaml-test-harness (#54) ← vahid/parity-harness (#53) ← vahid/from-situation (#52). Nine-deep stack.

🤖 Generated with Claude Code

vahid-ahmadi and others added 9 commits April 30, 2026 13:24
Adds a classmethod to the Python wrapper that accepts the PolicyEngine
web-app situation-JSON format (people / benunits / households with
`members` lists and period-keyed values) and converts it into the three
input DataFrames the Rust engine consumes.

Closes #51 in part — the small, low-risk piece. Datasets-from-URL and a
direct dataframe entry point can follow in subsequent PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `scripts/parity.py`, which runs a fixed set of synthetic households
through both the Python `policyengine-uk` package and the Rust
`policyengine_uk_compiled` wrapper, diffs key tax / benefit / net-income
outputs cell-for-cell, and prints a summary. Skips Python comparison
gracefully when the Python package isn't installed.

Wired into CI as a non-failing smoke step so it surfaces drift on every PR
without breaking on the divergences that already exist (currently up to
£3,276 on couple-with-children scenarios). Tolerance can be tightened
once those gaps close.

Stacked on top of #52 (Simulation.from_situation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `policyengine_uk_compiled.yaml_tests` — a runner that mirrors the
format used by `policyengine_uk/tests/policy/` so cases can be ported one
at a time.

The runner accepts either single-person flat input
(`input: { employment_income: 50000 }`) or full-situation input
(`input: { people: ..., benunits: ..., households: ... }`), supports
absolute and relative error margins, and writes outputs against the Rust
microdata column names (`baseline_income_tax`, `baseline_universal_credit`,
`baseline_net_income`, etc.).

This PR ships:
- The runner module with CLI: `python -m policyengine_uk_compiled.yaml_tests tests/policy`
- 11 hand-written YAML cases under `tests/policy/` covering income tax,
  employee NI, and Child Benefit (single + multi-person)
- A pytest module that auto-discovers and parametrizes the YAML cases
- 21 unit tests for the runner itself (input mapping, tolerance, parsing)
- pyyaml added to the package's runtime dependencies

Stacked on #53 (parity harness) which is itself stacked on #52
(Simulation.from_situation). Future PRs port more of the 196 Python YAML
tests that already exist in `policyengine_uk/tests/policy/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Property-transaction tax now dispatches by region:
- Scotland → LBTT (LBTT (Scotland) Act 2013)
- Wales    → LTT  (LTT and Anti-avoidance of Devolved Taxes (Wales) Act 2017)
- elsewhere → SDLT (Finance Act 2003 s.55, unchanged)

2025/26 residential bands per:
- SSI 2015/126 (Scotland)
- WSI 2018/128 (Wales)

Adds:
- `lbtt` and `ltt` parameter blocks in `parameters/2025_26.yaml`
- `Parameters.lbtt`/`Parameters.ltt` Rust fields and Python wrapper exposure
- `calculate_property_transaction_tax` dispatch function in
  `src/variables/wealth_taxes.rs`
- New `baseline_property_transaction_tax` and `reform_property_transaction_tax`
  per-household microdata columns
- Six Rust unit tests covering LBTT/LTT/SDLT dispatch and nil-band edges
- Six YAML policy-test cases (`tests/policy/property_transaction_tax.yaml`)

Stacked on #54 (YAML test harness).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now, PIP amount fields (`pip_daily_living`, `pip_mobility`) were
only populated from FRS recorded values; setting an eligibility flag on
a synthetic household built via `from_situation` produced £0 PIP, and
PIP-rate reforms had no effect even on FRS data when the recorded amount
sat outside the modelled rate.

This change adds:
- `PipParams` Rust struct (and Python wrapper class) with the four PIP
  weekly rates: daily-living standard/enhanced and mobility standard/enhanced
- 2025/26 rates per gov.uk/pip/what-youll-get sourced under Welfare
  Reform Act 2012 s.79 / SI 2013/377
- `pip_daily_living_amount` and `pip_mobility_amount` helpers in
  `src/variables/benefits.rs` that:
  - Pass through any FRS-recorded amount unchanged (preserves existing
    calibration behaviour)
  - Otherwise compute from the eligibility flag × the rate parameter
  - Return 0 when neither holds or `params.pip` is unset
- `passthrough_benefits` now uses these helpers, so PIP from flags flows
  into total_benefits and downstream household net income

Tests:
- 8 Rust unit tests covering the std/enh/recorded-override/no-flag/no-
  params/reform-scaling paths
- 4 YAML policy-test cases covering the same paths end-to-end

Stacked on #55 (LBTT/LTT).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the pattern from #56 (PIP) to:
- DLA care component (low/mid/high) — SSCBA 1992 Sch.2 para.2
- DLA mobility component (low/high) — SSCBA 1992 Sch.2 para.3
- Attendance Allowance (low/high) — SSCBA 1992 s.64

Synthetic households that set `dla_care_*` / `dla_mob_*` / `aa_*`
eligibility flags now produce non-zero amounts via the new
`DlaParams` and `AaParams` structs (with 2025/26 weekly rates from
gov.uk). FRS-recorded amounts continue to pass through unchanged.

Adds:
- 2025/26 rates in `parameters/2025_26.yaml`:
  DLA care low/mid/high £29.20/£73.90/£110.40 weekly,
  DLA mob low/high £29.20/£77.05 weekly,
  AA low/high £73.90/£110.40 weekly
- Helpers `dla_care_amount`, `dla_mobility_amount`,
  `attendance_allowance_amount` in `src/variables/benefits.rs`
- 10 Rust unit tests (recorded-override / no-flag / per-band-rate /
  passthrough flow)
- 4 YAML policy-test cases under `tests/policy/dla_aa.yaml`
- Python wrapper exposure (`DlaParams`, `AaParams`, `Parameters.dla`,
  `Parameters.aa`)

Stacked on #56.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Households with exactly one adult (18+) now receive a 25% discount on
the calculated council tax — Local Government Finance Act 1992 s.11(1)(a).

Adds:
- `single_person_discount_rate` field on `CouncilTaxParams` (default 0.25)
- Updates `calculate_council_tax(hh, params, is_single_adult)` to apply
  the discount
- Counts adults via `Person::is_adult()` (age >= 18) in `simulation.rs`
- New `baseline_council_tax_calculated` / `reform_council_tax_calculated`
  per-household microdata columns
- First-time exposure of `CouncilTaxParams` in the Python wrapper
- 3 new Rust unit tests (band D + band A discount, zero-discount-rate edge)
- 4 new YAML policy-test cases (`tests/policy/council_tax.yaml`)

The baseline run still uses the FRS-recorded `hh.council_tax` for net
income; the calculated value is for reform modelling, where now reforms
to either band-D rate or the discount fraction take effect.

Stacked on #57.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing old-SP scaling pattern for the new-SP cohort:
- If `person.state_pension > 0`: pass through, scaled by
  `(new_state_pension_weekly / baseline_new_sp_weekly)` for reform
  correctness
- Else: fall back to `new_state_pension_weekly × 52`

Previously the new-SP branch always returned the full parameter rate
× 52, ignoring any recorded amount. This over-stated SP for partial-
year claimants and broke parity for the pensioner_couple synthetic
scenario in PR #53's parity harness (£946 diff).

Implementation:
- Plumb `baseline_new_sp_weekly` through `Simulation`,
  `calculate_benunit`, `calculate_state_pension`, and
  `person_state_pension`, parallel to the existing
  `baseline_old_sp_weekly` field
- 3 new Rust unit tests (recorded-amount preserved, fallback to param
  when no record, recorded amount scales under reform)

Parity-harness impact (synthetic pensioner_couple scenario):
  state_pension     rust=23,000 py=23,000 diff=£0       (was £946)
  household_net_income           diff=£-41              (was £905)

Stacked on #58. Closes #59 (filed today as a follow-up to PR #53).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust's `baseline_net_income` is the HBAI net-income definition (gross
minus direct taxes plus benefits, excluding council tax / TV licence /
transaction taxes). The parity harness was comparing it against
Python's broader `household_net_income`, which subtracts council_tax,
TV licence, expected_sdlt/lbtt/ltt, etc., on top.

Net effect: every single scenario showed a £159 diff that was
exclusively the TV licence (£174.50 × ~0.911 take-up). That diff masked
the real, smaller divergences and made the harness's output look
worse than it was.

Switching to `hbai_household_net_income` reveals:
- single/couple scenarios: £1.20 / £2.40 diffs (just employer-NI rounding)
- lone_parent_2kids: £554 (real UC entitlement gap)
- pensioner_couple: £200 (Winter Fuel Allowance — Python includes, Rust doesn't yet)
- scotland_single_45k: £1.20

The headline "couple_2kids £3,276 UC gap" from the original PR #53
description was an artefact of this measurement bug — that scenario
now shows £2.40, well within tolerance.

Stacked on #60.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi force-pushed the vahid/state-pension-recorded branch from 89b4655 to 10c89d3 Compare May 29, 2026 09:17
@vahid-ahmadi
Copy link
Copy Markdown
Contributor Author

Superseded — recommend closing.

The fix here (comparing against hbai_household_net_income rather than household_net_income, to exclude the indirect/transaction taxes the Rust baseline omits) has been folded directly into the reworked #53. As part of decoupling the stack, #53 was rebased onto main and rewritten to compare FRS microdata outputs from both engines using hbai_household_net_income, so this one-line follow-up is no longer needed as a separate PR.

This PR also can't stand alone on main, since it only patches scripts/parity.py, which is introduced by #53. Suggest closing in favour of #53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant