Skip to content

Add CPS block geography exports#129

Merged
MaxGhenis merged 2 commits into
mainfrom
codex/cps-geoid-export-gaps-20260601
Jun 1, 2026
Merged

Add CPS block geography exports#129
MaxGhenis merged 2 commits into
mainfrom
codex/cps-geoid-export-gaps-20260601

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • assign CPS-spine households to Census blocks using Microplex's existing block probability crosswalk, partitioning by county when CPS county is available and falling back to state
  • derive and preserve eCPS-contract geography exports: block_geoid, tract_geoid, congressional_district_geoid, normalized county_fips, and existing state_fips
  • allow the household geography variables through the PolicyEngine export map and add regression coverage

Coordination

Tests

  • uv run --extra dev python -m pytest tests/pipelines/test_us.py::TestUSMicroplexPipeline::test_attach_household_census_geographies_from_state_county tests/pipelines/test_us.py::TestUSMicroplexPipeline::test_prepare_seed_data tests/policyengine/test_us.py::TestPolicyEngineUSProjection::test_build_policyengine_us_export_variable_maps_includes_contract_inputs -q
  • uv run --extra dev ruff check src/microplex_us/pipelines/us.py src/microplex_us/policyengine/us.py tests/pipelines/test_us.py tests/policyengine/test_us.py

…attach

Carry-overs from reconciling PR #129 against the closed PR #130:

- _congressional_district_geoid_from_cd_id: also accept the raw Census at-large
  forms (AL/ZZ tokens and district 0/98) and normalize them to district 01,
  matching eCPS's policyengine-us-data db/create_initial_strata.py. Verified the
  encoder reproduces the eCPS 436-CD calibration universe exactly on the real
  block crosswalk (AK=201, WY=5601, DC=1101).

- _attach_household_census_geographies: collapse to a fresh RangeIndex up front.
  The block write-back via .loc[row_index] previously raised
  "ValueError: cannot reindex on an axis with duplicate labels" on a non-unique
  household-frame index. The caller consumes the result via merge on the
  household_id column, not the index, so this is safe.

- Add tests/pipelines/test_geoid_cd_encoding.py: encoder contract (multi-district
  SSDD, at-large=01, raw-form hardening, invalid inputs), the duplicate-index
  regression, and a live 436-CD universe parity check (skips without the
  crosswalk parquet, e.g. in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis marked this pull request as ready for review June 1, 2026 10:08
@MaxGhenis MaxGhenis merged commit 6e7d23f into main Jun 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant