Learning (predictive) risk scores in the presence of censoring due to interventions

摘要

A large and diverse set of measurements are regularly collected during a patient’s hospital stay to monitor their health status. Tools for integrating these measurements into severity scores, that accurately track changes in illness severity, can improve clinicians’ ability to provide timely interventions. Existing approaches for creating such scores either (1) rely on experts to fully specify the severity score, (2) infer a score using detailed models of disease progression, or (3) train a predictive score, using supervised learning, by regressing against a surrogate marker of severity such as the presence of downstream adverse events. The first approach does not extend to diseases where an accurate score cannot be elicited from experts. The second assumes that the progression of disease can be accurately modeled, limiting its application to populations with simple, well-understood disease dynamics. The third approach, also most commonly used, often produces scores that suffer from bias due to treatment-related censoring (Paxton et al. in AMIA annual symposium proceedings, American Medical Informatics Association, p 1109, 2013). Specifically, since the downstream outcomes used for their training are observed only noisily and are influenced by treatment administration patterns, these scores do not generalize well when treatment administration patterns change. We propose a novel ranking based framework for disease severity score learning (DSSL). DSSL exploits the following key observation: while it is challenging for experts to quantify the disease severity at any given time, it is often easy to compare the disease severity at two different times. Extending existing ranking algorithms, DSSL learns a function that maps a vector of patient’s measurements to a scalar severity score subject to two constraints. First, the resulting score should be consistent with the expert’s ranking of the disease severity state. Second, changes in score between consecutive periods should be smooth. We apply DSSL to the problem of learning a sepsis severity score using a large, real-world electronic health record dataset. The learned scores significantly outperform state-of-the-art clinical scores in ranking patient states by severity and in early detection of downstream adverse events. We also show that the learned disease severity trajectories are consistent with clinical expectations of disease evolution. Further, we simulate datasets containing different treatment administration patterns and show that DSSL shows better generalization performance to changes in treatment patterns compared to the above approaches.