Skip to content

Vintage profiles: single source of truth for source years (kills stale-default footgun)#189

Draft
MaxGhenis wants to merge 2 commits into
mainfrom
claude/vintage-profile-20260602
Draft

Vintage profiles: single source of truth for source years (kills stale-default footgun)#189
MaxGhenis wants to merge 2 commits into
mainfrom
claude/vintage-profile-20260602

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Jun 2, 2026

Why

Source release years lived as literal defaults in three places (provider signature, checkpoint signature, CLI --*-year). They drifted from reality and failed open: cps_source_year defaulted to 2023 (income year 2022) while every production build overrode it to 2025 via a shell flag, and acs_year had drifted to 2024 while the loader, manifest, and build scripts all use 2022. Stale literals sat in the signatures indefinitely because nothing errored.

What

microplex_us.vintages — define each dataset's vintage once:

  • Release — one source release + how its dollars reach the model year (native, or age_to with a component-specific factors family). Validates age_tofactors, rejects backward aging and empty gap_reason.
  • DatasetProfile — model year + the five sources; __post_init__ enforces coherence (every source reaches model_year or declares an explicit gap_reason).
  • MP_2024 — the current 2024 base dataset: CPS ASEC 2025 (income 2024) native spine · PUF 2015→2024 (SOI) · ACS 2022 (declared gap — see below) · SIPP 2023→2024 · SCF 2022→2024.

The provider / checkpoint / CLI year defaults now derive from MP_2024, so each value is defined once and the safe path is the only path.

Behavior change — two stale defaults corrected to match what builds load

  • CPS default 2023 → 2025 (the profile value; what builds already override to).
  • ACS default 2024 → 2022. The /cycle review caught that acs_year had drifted to 2024 while the ACS loader is pinned to ACS_2022 (manifest default_year=2022), ACS is excluded from TARGET_YEAR_UPRATED_SURVEYS (never aged), and every build script passes --acs-year 2022. MP_2024.acs now declares release 2022 with a gap_reason flagging that an ACS-2024 move is a loader migration, not silently assumed done. If/when ACS moves to a 2024 release, the profile flips in one place.

Every other year default is unchanged in value, just single-sourced.

Tests

  • test_vintages.py — coherence, the declared ACS gap, Release validation, and a manifest-tie guard asserting each donor release equals the pe_source_impute manifest default_year (this catches exactly the ACS profile-vs-loader drift the review found).
  • Regression guards assert the provider and checkpoint signature defaults derive from MP_2024, so a stale literal can't creep back.
  • ruff clean; all vintage/provider/checkpoint tests pass. Independently reviewed via /cycle (two rounds, clean).

Follow-ups (for codex to fold into the vintage work)

  1. Drop the per-call --*-year args in favor of --profile mp_2024; have providers read profile.release(...). (This PR keeps the args for a contained first step.)
  2. Build-time gate: check a produced artifact against the active profile — freshness (each release vs latest-available, with discovery → PR) and basis coherence. The incoherent_sources() / declared_gaps() helpers are the seed.
  3. Resolve the ACS gap — either complete the ACS-2024 loader migration (ACS_2022→2024, native or aged) and flip MP_2024.acs, or confirm 2022 is intended and drop the gap.
  4. Minor / out-of-scope here: unit-test the CLI default (needs a parser-extraction refactor); thread the dead cps_asec_source_year in USMicroplexBuildConfig and the source_stage_parity diagnostic defaults; bind factors labels to the Age SIPP and SCF donors to target year #185 aging implementation.

🤖 Generated with Claude Code

MaxGhenis and others added 2 commits June 2, 2026 17:12
… years

Source release years were declared as literal defaults in three places (the
provider signature, the checkpoint signature, and the CLI), so they could drift
from the real build: cps_source_year defaulted to 2023 (income year 2022) while
every production build overrode it to 2025 via a shell flag. The stale literal
sat in three signatures and failed open -- nothing errored.

Introduce microplex_us.vintages. A DatasetProfile declares, in ONE place, the
model year a dataset represents and each source's release plus how its dollars
reach that year (native, or aged with a component-specific factor family). A
coherence check asserts every source reaches model_year or declares an explicit
gap_reason. MP_2024 is the current 2024 base dataset: CPS ASEC 2025 (income year
2024) native spine, PUF 2015->2024 via SOI factors, ACS 2024, SIPP 2023->2024,
SCF 2022->2024.

Thread the year defaults through MP_2024 so the value is defined once and the
safe path is the only path; the stale CPS default becomes the profile's 2025. A
regression guard asserts the provider defaults derive from the profile.

Foundation for codex: follow-ups are to drop the per-call --*-year args in favor
of `--profile`, and add a build-time gate that checks a produced artifact against
the active profile (freshness vs latest release + basis coherence).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent review found MP_2024.acs declared release 2024 while the ACS donor
loader is pinned to ACS_2022 (manifest default_year=2022) and is excluded from
TARGET_YEAR_UPRATED_SURVEYS (never aged). The real ACS vintage is 2022; the
provider default had silently drifted to 2024 vs what every build script
(--acs-year 2022), the manifest, and the gate1 build log actually load. The
profile enshrining 2024 defeated its own purpose.

- Correct MP_2024.acs to release 2022 with a declared gap_reason. The acs_year
  default now resolves to 2022 (matching the loader/scripts), so the build no
  longer needs to override it; the gap_reason flags that an ACS-2024 move is a
  loader migration, not silently assumed done.
- Add a manifest-tie test asserting each donor release equals the pe_source_
  impute manifest default_year -- catches exactly this profile-vs-loader drift.
- Extend the default-derivation regression guard to the checkpoint signature,
  not just the provider.
- Update the coherence test for the declared ACS gap; harden Release (reject an
  empty gap_reason) and get_profile (chain-free KeyError).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant