Model selection to improve multiple imputation for handling high rate missingness in a water quality dataset

作者:

Highlights:

• Handling nonlinearity using Machine Learning methods.

• Handling high missing rate by combining these methods with multiple imputation.

• Generalization through hyperparameters optimization.

• Best performance in terms of error and run time with Support Vector Regression.

摘要

•Handling nonlinearity using Machine Learning methods.•Handling high missing rate by combining these methods with multiple imputation.•Generalization through hyperparameters optimization.•Best performance in terms of error and run time with Support Vector Regression.

论文关键词:Multiple imputation,High missingness,Model selection,Machine learning,Data preprocessing,Water quality

论文评审过程:Received 20 August 2018, Revised 4 March 2019, Accepted 19 April 2019, Available online 20 April 2019, Version of Record 6 May 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.04.049