Summary
The tip income construction in sipp.py sums all columns matching *TXAMT*, which inadvertently includes both the actual tip dollar amounts (TJB*_TXAMT) and Census allocation flags (AJB*_TXAMT). The allocation flags are small integers (0, 1, 2) indicating whether Census imputed the value, not dollar amounts.
Current code
policyengine_us_data/datasets/sipp/sipp.py line ~69-72:
df["tip_income"] = (
df[df.columns[df.columns.str.contains("TXAMT")]].fillna(0).sum(axis=1)
* 12
)
Fix
Filter to only the actual tip amount columns:
df["tip_income"] = (
df[df.columns[df.columns.str.match(r"TJB\d_TXAMT")]].fillna(0).sum(axis=1)
* 12
)
Impact
Likely minor since allocation flags are small integers vs dollar amounts, but it's incorrect and should be fixed.
Context
This was identified while comparing PolicyEngine's tip income deduction revenue estimate ($4.7B) against JCT's score ($10.0B for FY2026). See related issues for other improvements to close this gap.
Summary
The tip income construction in
sipp.pysums all columns matching*TXAMT*, which inadvertently includes both the actual tip dollar amounts (TJB*_TXAMT) and Census allocation flags (AJB*_TXAMT). The allocation flags are small integers (0, 1, 2) indicating whether Census imputed the value, not dollar amounts.Current code
policyengine_us_data/datasets/sipp/sipp.pyline ~69-72:Fix
Filter to only the actual tip amount columns:
Impact
Likely minor since allocation flags are small integers vs dollar amounts, but it's incorrect and should be fixed.
Context
This was identified while comparing PolicyEngine's tip income deduction revenue estimate ($4.7B) against JCT's score ($10.0B for FY2026). See related issues for other improvements to close this gap.