Skip to content

Add SepsisPredictionEICU and SepsisPredictionMIMIC4 tasks — pre-ICU window, Sepsis-3 proxy labeling #911

@SHA888

Description

@SHA888

Summary

PyHealth currently has no sepsis prediction task. This issue proposes adding two
new task definitions:

  • SepsisPredictionEICU — eICU Collaborative Research Database
  • SepsisPredictionMIMIC4 — MIMIC-IV

Both tasks target pre-ICU / early ICU detection windows, filling a confirmed
gap identified in a PROSPERO-registered systematic review (CRD420251164609) covering
14 studies (53,795 participants): nearly all existing sepsis ML models are trained
on in-ICU data with no standardized reproducible benchmark.

Clinical Motivation

Sepsis is defined by the Sepsis-3 consensus (Singer et al., JAMA 2016) as
life-threatening organ dysfunction caused by dysregulated host response to infection.
Early detection — before ICU admission or within the first hours — is where clinical
impact is highest and where published ML models show the widest performance variance
(AUROC 0.72–0.94 across studies), largely due to inconsistent cohort definitions
and labeling strategies.

A reproducible PyHealth task enforces a consistent:

  • Observation window (configurable: default 6h from first available data)
  • Prediction gap (configurable: 0–12h before onset)
  • Sepsis label derivation (Sepsis-3 proxy: ICD codes + SOFA ≥ 2 or
    apacheadmissiondx for eICU)

Proposed Implementation

SepsisPredictionEICU

input_schema = {
    "vitals": "time_series",     # HR, RR, MAP, SpO2, Temp from vitalPeriodic
    "labs": "sequence",          # WBC, lactate, creatinine from lab table
    "conditions": "sequence",    # apacheadmissiondx / pasthistory
    "demographics": "static",    # age, gender, admissionweight
}
output_schema = {
    "sepsis_label": "binary",
}

Label derivation from eICU:

  • Positive: apacheadmissiondx containing sepsis/septic shock keywords or
    ICD codes in diagnosis table (995.91, 995.92, A41.x)
  • Negative: non-infectious admission diagnoses
  • Excluded: ambiguous/missing diagnosis records

SepsisPredictionMIMIC4

input_schema = {
    "vitals": "time_series",     # chartevents: HR, RR, MAP, SpO2, GCS
    "labs": "sequence",          # labevents: WBC, creatinine, bilirubin, lactate
    "conditions": "sequence",    # diagnoses_icd (ICD-10)
    "demographics": "static",    # age, gender, admission_type
}
output_schema = {
    "sepsis_label": "binary",
}

Label derivation from MIMIC-IV:

  • Use sepsis3 flag from mimiciv_derived.sepsis3 (already in MIMIC-IV derived
    tables) where available
  • Fallback: ICD-10 codes A40.x, A41.x in diagnoses_icd

Deliverables

  • pyhealth/tasks/sepsis_prediction_eicu.py
  • pyhealth/tasks/sepsis_prediction_mimic4.py
  • Export in pyhealth/tasks/__init__.py
  • Unit tests in tests/tasks/
  • Example notebook in examples/sepsis_prediction/
  • Docstring with clinical references

References

  • Singer M et al. The Third International Consensus Definitions for Sepsis and
    Septic Shock (Sepsis-3). JAMA. 2016;315(8):801–810.
  • Johnson AEW et al. MIMIC-IV, a freely accessible electronic health record
    dataset. Sci Data. 2023.
  • Pollard TJ et al. The eICU Collaborative Research Database. Sci Data. 2018.
  • Sucandra et al. Time Advantage and Diagnostic Accuracy of Biomarker-Enhanced
    ML/DL for Sepsis Detection in Pre-ICU Settings. PROSPERO CRD420251164609.

Notes

Happy to implement this. Will open a PR once approach is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions