Context
After PR #79, Microplex tax-unit construction is close to the eCPS structural reference, but the sound eCPS replacement comparison still shows large losses. The next structural mismatch is SPM/family fragmentation in the PolicyEngine export.
Fresh matched-N comparison artifact after PR #79:
/Users/maxghenis/CosilicoAI/microplex-us/artifacts/small_asec_acs100k_household_coherent_20260529/sound_ecps_replacement_comparison/sound_ecps_replacement_comparison.json
Key result:
- candidate refit loss:
3.7243506116660963
- eCPS refit loss:
0.1726525197190867
- candidate holdout loss:
0.5319674877579285
- eCPS holdout loss:
0.02754433292858976
The tax-unit structure is now reasonable:
- candidate matched tax units:
54,034 on 41,314 households (1.308/HH)
- eCPS tax units:
55,264 on 41,314 households (1.338/HH)
But SPM/family structures remain overfragmented:
- candidate matched SPM units:
65,905 on 41,314 households (1.595/HH)
- eCPS SPM units:
43,134 on 41,314 households (1.044/HH)
- candidate matched families:
65,905 (1.595/HH)
- eCPS families:
46,222 (1.119/HH)
Current root cause:
_assign_family_and_spm_units uses the same fallback split for family and SPM.
- It puts relationship
{0, 1, 2} in one primary family/SPM unit and assigns every other person to their own family/SPM unit.
- The calibrated source parquet already has
family_id and spm_unit_id with the same inflated count (159,331 on 100,000 households), so simply preserving those columns is not enough.
Current mitigation
PR #80 changes only the SPM fallback to one SPM unit per household while keeping the current family split unchanged:
#80
Lightweight structural probe with PR #80 logic:
/Users/maxghenis/CosilicoAI/microplex-us/artifacts/small_asec_acs100k_household_spm_20260529/structural_probe.json
- households:
100,000
- persons:
245,714
- tax units:
130,980 (1.3098/HH)
- SPM units:
100,000 (1.0/HH)
- families:
159,331 (1.59331/HH), intentionally unchanged
Desired direction
Do not hard-code eCPS structure as the target. Use Microplex architecture to construct coherent relational units from source relationships and donor structure:
- SPM units should be household-coherent unless richer SPM relationship detail supports a split.
- Family units need a separate, data-driven rule; current family=primary-plus-singletons is likely too fragmented.
- Preserve diagnostics in sidecars/gates, not in the model H5.
- Add structural gates for SPM/family units per household and singleton-unit shares, calibrated against source distributions and policy semantics rather than only eCPS.
Acceptance criteria
- Small ASEC+ACS100k PE export has plausible SPM/family unit counts and no impossible entity memberships.
- A sidecar reports household/person/tax-unit/SPM/family/marital counts, per-household ratios, singleton shares, and cross-household ID violations.
- The sound matched-N symmetric-refit eCPS comparison is rerun after SPM/family changes.
- The comparison report breaks out SNAP, Census/SPM poverty, IRS filing-status-sensitive cells, and protected target families so we can tell whether the structural change improved the intended surfaces.
Context
After PR #79, Microplex tax-unit construction is close to the eCPS structural reference, but the sound eCPS replacement comparison still shows large losses. The next structural mismatch is SPM/family fragmentation in the PolicyEngine export.
Fresh matched-N comparison artifact after PR #79:
/Users/maxghenis/CosilicoAI/microplex-us/artifacts/small_asec_acs100k_household_coherent_20260529/sound_ecps_replacement_comparison/sound_ecps_replacement_comparison.jsonKey result:
3.72435061166609630.17265251971908670.53196748775792850.02754433292858976The tax-unit structure is now reasonable:
54,034on41,314households (1.308/HH)55,264on41,314households (1.338/HH)But SPM/family structures remain overfragmented:
65,905on41,314households (1.595/HH)43,134on41,314households (1.044/HH)65,905(1.595/HH)46,222(1.119/HH)Current root cause:
_assign_family_and_spm_unitsuses the same fallback split for family and SPM.{0, 1, 2}in one primary family/SPM unit and assigns every other person to their own family/SPM unit.family_idandspm_unit_idwith the same inflated count (159,331on100,000households), so simply preserving those columns is not enough.Current mitigation
PR #80 changes only the SPM fallback to one SPM unit per household while keeping the current family split unchanged:
#80
Lightweight structural probe with PR #80 logic:
/Users/maxghenis/CosilicoAI/microplex-us/artifacts/small_asec_acs100k_household_spm_20260529/structural_probe.json100,000245,714130,980(1.3098/HH)100,000(1.0/HH)159,331(1.59331/HH), intentionally unchangedDesired direction
Do not hard-code eCPS structure as the target. Use Microplex architecture to construct coherent relational units from source relationships and donor structure:
Acceptance criteria