Skip to content

Comments

feat: impute below-threshold student loan borrowers#282

Open
nwoodruff-co wants to merge 8 commits intomainfrom
feat/impute-below-threshold-borrowers
Open

feat: impute below-threshold student loan borrowers#282
nwoodruff-co wants to merge 8 commits intomainfrom
feat/impute-below-threshold-borrowers

Conversation

@nwoodruff-co
Copy link
Collaborator

Summary

  • Add SLC "liable to repay" targets for Plan 2 and Plan 5 (alongside existing above-threshold targets)
  • Probabilistically impute student loan plans to tertiary-educated people without reported repayments
  • Add compute_student_loan_plan_liable() for calibration against total borrower counts

Context

The FRS only captures borrowers making PAYE repayments (earning above threshold). SLC data shows ~55% of Plan 2 holders earn below the repayment threshold. For Plan 5 (started 2023+), almost no borrowers are captured in FRS because graduates are too young to be above threshold.

This imputation fills the gap by assigning plans to tertiary-educated people in England based on:

  • Age band constraints (Plan 2: 21-33, Plan 5: 21-24)
  • Cohort logic (university start year)
  • Target counts from SLC "liable to repay" minus "above threshold"

Test plan

  • All student loan target tests pass
  • Existing student loan plan tests pass
  • Build a test dataset and verify Plan 2/5 counts approach SLC targets

Closes #281

The FRS only captures borrowers making PAYE repayments (above threshold).
SLC data shows ~55% of Plan 2 holders earn below threshold. This adds
probabilistic imputation of below-threshold borrowers for Plan 2 and
Plan 5 based on SLC "liable to repay" counts.

Changes:
- Add slc/plan_*_borrowers_liable targets alongside existing above_threshold
- Probabilistically assign plans to tertiary-educated people without
  repayments, constrained by age band and cohort
- Add compute_student_loan_plan_liable() for calibration

Closes #281
The strict cohort constraint (ages 21-31) missed many Plan 2 borrowers
who started university late or did postgrad studies. This change:

1. Expands Plan 2 age mask from 21+ to 21-45
2. Uses age mask (not cohort) for Plan 2 below-threshold assignment
3. Assigns repayers to Plan 2 if age 21-45 and not Plan 1 cohort

The below-threshold imputation now covers the full 4.95M target.
The remaining gap (FRS shows 1.4M repayers vs SLC's 4M) is a data
collection issue that calibration targets will help address.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impute loan-holder-but-not-repaying status to FRS base dataset

1 participant