Efficient treatment of outliers and class imbalance for diabetes prediction

作者:

Highlights:

• A two-step method that mitigates training data imperfections for diabetes prediction.

• The method detects outliers using the interquartile range algorithm.

• The Synthetic Minority Oversampling Technique is used to generate artificial data.

• Controlling outliers and class imbalance reduces learning bias.

摘要

•A two-step method that mitigates training data imperfections for diabetes prediction.•The method detects outliers using the interquartile range algorithm.•The Synthetic Minority Oversampling Technique is used to generate artificial data.•Controlling outliers and class imbalance reduces learning bias.

论文关键词:Outlier detection,Imbalanced data,Machine learning,Data preprocessing,Oversampling,SMOTE

论文评审过程:Received 3 December 2018, Revised 31 January 2020, Accepted 4 February 2020, Available online 10 February 2020, Version of Record 24 February 2020.

论文官网地址:https://doi.org/10.1016/j.artmed.2020.101815