Vintage profiles: single source of truth for source years (kills stale-default footgun)#189
Draft
MaxGhenis wants to merge 2 commits into
Draft
Vintage profiles: single source of truth for source years (kills stale-default footgun)#189MaxGhenis wants to merge 2 commits into
MaxGhenis wants to merge 2 commits into
Conversation
… years Source release years were declared as literal defaults in three places (the provider signature, the checkpoint signature, and the CLI), so they could drift from the real build: cps_source_year defaulted to 2023 (income year 2022) while every production build overrode it to 2025 via a shell flag. The stale literal sat in three signatures and failed open -- nothing errored. Introduce microplex_us.vintages. A DatasetProfile declares, in ONE place, the model year a dataset represents and each source's release plus how its dollars reach that year (native, or aged with a component-specific factor family). A coherence check asserts every source reaches model_year or declares an explicit gap_reason. MP_2024 is the current 2024 base dataset: CPS ASEC 2025 (income year 2024) native spine, PUF 2015->2024 via SOI factors, ACS 2024, SIPP 2023->2024, SCF 2022->2024. Thread the year defaults through MP_2024 so the value is defined once and the safe path is the only path; the stale CPS default becomes the profile's 2025. A regression guard asserts the provider defaults derive from the profile. Foundation for codex: follow-ups are to drop the per-call --*-year args in favor of `--profile`, and add a build-time gate that checks a produced artifact against the active profile (freshness vs latest release + basis coherence). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent review found MP_2024.acs declared release 2024 while the ACS donor loader is pinned to ACS_2022 (manifest default_year=2022) and is excluded from TARGET_YEAR_UPRATED_SURVEYS (never aged). The real ACS vintage is 2022; the provider default had silently drifted to 2024 vs what every build script (--acs-year 2022), the manifest, and the gate1 build log actually load. The profile enshrining 2024 defeated its own purpose. - Correct MP_2024.acs to release 2022 with a declared gap_reason. The acs_year default now resolves to 2022 (matching the loader/scripts), so the build no longer needs to override it; the gap_reason flags that an ACS-2024 move is a loader migration, not silently assumed done. - Add a manifest-tie test asserting each donor release equals the pe_source_ impute manifest default_year -- catches exactly this profile-vs-loader drift. - Extend the default-derivation regression guard to the checkpoint signature, not just the provider. - Update the coherence test for the declared ACS gap; harden Release (reject an empty gap_reason) and get_profile (chain-free KeyError). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Source release years lived as literal defaults in three places (provider signature, checkpoint signature, CLI
--*-year). They drifted from reality and failed open:cps_source_yeardefaulted to 2023 (income year 2022) while every production build overrode it to 2025 via a shell flag, andacs_yearhad drifted to 2024 while the loader, manifest, and build scripts all use 2022. Stale literals sat in the signatures indefinitely because nothing errored.What
microplex_us.vintages— define each dataset's vintage once:Release— one source release + how its dollars reach the model year (native, orage_towith a component-specificfactorsfamily). Validatesage_to⇔factors, rejects backward aging and emptygap_reason.DatasetProfile— model year + the five sources;__post_init__enforces coherence (every source reachesmodel_yearor declares an explicitgap_reason).MP_2024— the current 2024 base dataset: CPS ASEC 2025 (income 2024) native spine · PUF 2015→2024 (SOI) · ACS 2022 (declared gap — see below) · SIPP 2023→2024 · SCF 2022→2024.The provider / checkpoint / CLI year defaults now derive from
MP_2024, so each value is defined once and the safe path is the only path.Behavior change — two stale defaults corrected to match what builds load
/cyclereview caught thatacs_yearhad drifted to 2024 while the ACS loader is pinned toACS_2022(manifestdefault_year=2022), ACS is excluded fromTARGET_YEAR_UPRATED_SURVEYS(never aged), and every build script passes--acs-year 2022.MP_2024.acsnow declares release 2022 with agap_reasonflagging that an ACS-2024 move is a loader migration, not silently assumed done. If/when ACS moves to a 2024 release, the profile flips in one place.Every other year default is unchanged in value, just single-sourced.
Tests
test_vintages.py— coherence, the declared ACS gap,Releasevalidation, and a manifest-tie guard asserting each donor release equals thepe_source_imputemanifestdefault_year(this catches exactly the ACS profile-vs-loader drift the review found).MP_2024, so a stale literal can't creep back.ruffclean; all vintage/provider/checkpoint tests pass. Independently reviewed via/cycle(two rounds, clean).Follow-ups (for codex to fold into the vintage work)
--*-yearargs in favor of--profile mp_2024; have providers readprofile.release(...). (This PR keeps the args for a contained first step.)incoherent_sources()/declared_gaps()helpers are the seed.ACS_2022→2024, native or aged) and flipMP_2024.acs, or confirm 2022 is intended and drop the gap.cps_asec_source_yearinUSMicroplexBuildConfigand thesource_stage_paritydiagnostic defaults; bindfactorslabels to the Age SIPP and SCF donors to target year #185 aging implementation.🤖 Generated with Claude Code