Skip to content

Add PUF support clone for eCPS replacement#141

Draft
MaxGhenis wants to merge 3 commits into
mainfrom
codex/ecps-puf-support-clone-20260601
Draft

Add PUF support clone for eCPS replacement#141
MaxGhenis wants to merge 3 commits into
mainfrom
codex/ecps-puf-support-clone-20260601

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • add an eCPS-style PUF support clone stage that runs before later donor imputations
  • enable the stage by default for the PolicyEngine US data rebuild path, with a CLI override
  • add calibration activation diagnostics and source-weight diagnostics for zero-initial-weight PUF clone households
  • cover generated entity IDs, no-calibration runs, donor ordering, and configured clone flags in regression tests

Closes #140.

Validation

  • uv run --python 3.13 ruff check src/microplex_us/pipelines/us.py src/microplex_us/pipelines/artifacts.py src/microplex_us/pipelines/pe_us_data_rebuild.py src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py tests/pipelines/test_us.py tests/pipelines/test_artifacts.py tests/pipelines/test_pe_us_data_rebuild.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py
  • uv run --python 3.13 ruff format --check src/microplex_us/pipelines/us.py src/microplex_us/pipelines/artifacts.py src/microplex_us/pipelines/pe_us_data_rebuild.py src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py tests/pipelines/test_us.py tests/pipelines/test_artifacts.py tests/pipelines/test_pe_us_data_rebuild.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py
  • uv run --python 3.13 --extra dev python -m pytest tests/pipelines/test_artifacts.py::TestSaveUSMicroplexArtifacts::test_source_weight_diagnostics_respects_configured_puf_flag_column -q
  • uv run --python 3.13 --extra dev --extra policyengine python -m pytest tests/pipelines/test_us.py -k 'puf_support_clone or integrate_donor_sources' -q\n- uv run --python 3.13 --extra dev --extra policyengine python -m pytest tests/pipelines/test_artifacts.py tests/pipelines/test_pe_us_data_rebuild.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py -q\n\n## Review\n- /cycle completed; final independent reviewer reported no actionable findings.

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

PUF aggregate-record follow-up added after comparing to eCPS PR #627.

What changed:

  • MP now disaggregates PUF aggregate rows (MARS=0, RECID 999996-999999) before PUF source construction instead of dropping them.
  • Local raw PUF smoke: 207,696 raw rows with 4 aggregate rows -> 207,814 loaded rows, 122 synthetic aggregate-derived rows, and no remaining MARS=0 rows.
  • This mirrors the relevant eCPS upstream behavior from Disaggregate PUF aggregate records and fix QRF high-income training policyengine-us-data#627 while keeping MP support-clone architecture separate.

Additional validation:

  • uv run --python 3.13 ruff check src/microplex_us/data_sources/puf.py src/microplex_us/pipelines/artifacts.py tests/test_puf_source_provider.py
  • uv run --python 3.13 ruff format --check src/microplex_us/data_sources/puf.py src/microplex_us/pipelines/artifacts.py tests/test_puf_source_provider.py
  • uv run --python 3.13 --extra dev python -m pytest tests/test_puf_source_provider.py::test_load_puf_raw_disaggregates_aggregate_records tests/pipelines/test_artifacts.py::TestSaveUSMicroplexArtifacts::test_source_weight_diagnostics_respects_configured_puf_flag_column -q

A disaggregated Gate-1 artifact rebuild is running locally now; the earlier no-disagg support-clone comparison is diagnostic only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add PUF support clone for MP vs eCPS replacement

1 participant