Learning Data-Driven Patient Risk Stratification Models for Clostridium difficile
- Jenna Wiens ,
- Wayne N. Campbell ,
- Ella S. Franklin ,
- John V. Guttag ,
- Eric Horvitz
Oxford Journals |
Background
Though many risk factors are well known, Clostridium difficile infection (CDI) continues to be a significant problem throughout the world. The purpose of this study was to develop and validate a data-driven hospital-specific risk stratification procedure for estimating the probability that an inpatient will test positive for Clostridium difficile.
Methods and Findings
We consider electronic medical record (EMR) data from patients admitted for ≥24 hours to Washington Hospital Center, between April 2011 and April 2013. Predictive models were constructed using L2-regularized logistic regression and data from the first year. The number of observational variables considered varied from a small set of well-known risk factors readily available to a physician to over 10,000 variables automatically extracted from the EMR. Each model was evaluated on holdout admission data from the following year. 34,846 admissions with 372 cases of CDI were used to train the model. Applied to the separate validation set of 34,722 admissions with 355 cases of CDI, the model that made use of the additional EMR data yielded an area under the receiver operating characteristic curve (AUROC) of 0.81 (95%CI 0.79-0.83), and significantly outperformed the model that considered only the small set of known clinical risk factors, AUROC of 0.71 (95% CI 0.69-0.75).
Conclusions
Automated risk stratification of patients based on the contents of their EMRs can be used to accurately identify a high-risk population of patients. The proposed method holds promise for enabling the selective allocation of interventions aimed at reducing the rate of CDI.