Missing care: A framework to address the issue of frequent missing values;The case of a clinical decision support system for Parkinson's disease

作者:

Highlights:

• Implementation of electronic health record (EHR) systems have been evolving.

• In this study, the data quality issue of very high degrees of missing values in EHR data is discussed.

• A new framework called ‘Missing Care’ is introduced to address this issue.

• A clinical decision support system (CDSS) for Parkinson's disease is developed.

• Advanced imbalanced data learning and ensemble methods are employed to develop the CDSS.

摘要

In recent decades, the implementation of electronic health record (EHR) systems has been evolving worldwide, leading to the creation of immense data volume in healthcare. Moreover, there has been a call for research studies to enhance personalized medicine and develop clinical decision support systems (CDSS) by analyzing the available EHR data. In EHR data, usually, there are millions of patients records with hundreds of features collected over a long period of time. This enormity of EHR data poses significant challenges, one of which is dealing with many variables with very high degrees of missing values. In this study, the data quality issue of incompleteness in EHR data is discussed, and a framework called ‘Missing Care’ is introduced to address this issue. Using Missing Care, researchers will be able to select the most important variables at an acceptable missing values degree to develop predictive models with high predictive power. Moreover, Missing Care is applied to analyze a unique, large EHR data to develop a CDSS for detecting Parkinson's disease. Parkinson is a complex disease, and even a specialist's diagnosis is not without error. Besides, there is a lack of access to specialists in more remote areas, and as a result, about half of the patients with Parkinson's disease in the US remain undiagnosed. The developed CDSS can be integrated into EHR systems or utilized as an independent tool by healthcare practitioners who are not necessarily specialists; therefore, making up for the limited access to specialized care in remote areas.

论文关键词:Electronic health records,Data missing values,Clinical decision support systems,Predictive healthcare analytics,Imbalanced data learning,Parkinson's disease

论文评审过程:Received 7 November 2019, Revised 1 June 2020, Accepted 1 June 2020, Available online 12 June 2020, Version of Record 28 July 2020.

论文官网地址:https://doi.org/10.1016/j.dss.2020.113339