-
Notifications
You must be signed in to change notification settings - Fork 595
Description
Summary
PyHealth currently has no sepsis prediction task. This issue proposes adding two
new task definitions:
SepsisPredictionEICU— eICU Collaborative Research DatabaseSepsisPredictionMIMIC4— MIMIC-IV
Both tasks target pre-ICU / early ICU detection windows, filling a confirmed
gap identified in a PROSPERO-registered systematic review (CRD420251164609) covering
14 studies (53,795 participants): nearly all existing sepsis ML models are trained
on in-ICU data with no standardized reproducible benchmark.
Clinical Motivation
Sepsis is defined by the Sepsis-3 consensus (Singer et al., JAMA 2016) as
life-threatening organ dysfunction caused by dysregulated host response to infection.
Early detection — before ICU admission or within the first hours — is where clinical
impact is highest and where published ML models show the widest performance variance
(AUROC 0.72–0.94 across studies), largely due to inconsistent cohort definitions
and labeling strategies.
A reproducible PyHealth task enforces a consistent:
- Observation window (configurable: default 6h from first available data)
- Prediction gap (configurable: 0–12h before onset)
- Sepsis label derivation (Sepsis-3 proxy: ICD codes + SOFA ≥ 2 or
apacheadmissiondxfor eICU)
Proposed Implementation
SepsisPredictionEICU
input_schema = {
"vitals": "time_series", # HR, RR, MAP, SpO2, Temp from vitalPeriodic
"labs": "sequence", # WBC, lactate, creatinine from lab table
"conditions": "sequence", # apacheadmissiondx / pasthistory
"demographics": "static", # age, gender, admissionweight
}
output_schema = {
"sepsis_label": "binary",
}Label derivation from eICU:
- Positive:
apacheadmissiondxcontaining sepsis/septic shock keywords or
ICD codes indiagnosistable (995.91, 995.92, A41.x) - Negative: non-infectious admission diagnoses
- Excluded: ambiguous/missing diagnosis records
SepsisPredictionMIMIC4
input_schema = {
"vitals": "time_series", # chartevents: HR, RR, MAP, SpO2, GCS
"labs": "sequence", # labevents: WBC, creatinine, bilirubin, lactate
"conditions": "sequence", # diagnoses_icd (ICD-10)
"demographics": "static", # age, gender, admission_type
}
output_schema = {
"sepsis_label": "binary",
}Label derivation from MIMIC-IV:
- Use
sepsis3flag frommimiciv_derived.sepsis3(already in MIMIC-IV derived
tables) where available - Fallback: ICD-10 codes A40.x, A41.x in
diagnoses_icd
Deliverables
-
pyhealth/tasks/sepsis_prediction_eicu.py -
pyhealth/tasks/sepsis_prediction_mimic4.py - Export in
pyhealth/tasks/__init__.py - Unit tests in
tests/tasks/ - Example notebook in
examples/sepsis_prediction/ - Docstring with clinical references
References
- Singer M et al. The Third International Consensus Definitions for Sepsis and
Septic Shock (Sepsis-3). JAMA. 2016;315(8):801–810. - Johnson AEW et al. MIMIC-IV, a freely accessible electronic health record
dataset. Sci Data. 2023. - Pollard TJ et al. The eICU Collaborative Research Database. Sci Data. 2018.
- Sucandra et al. Time Advantage and Diagnostic Accuracy of Biomarker-Enhanced
ML/DL for Sepsis Detection in Pre-ICU Settings. PROSPERO CRD420251164609.
Notes
Happy to implement this. Will open a PR once approach is confirmed.