EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

作者:

Highlights:

摘要

Missing data is a considerable problem in knowledge extraction where the completeness and the quality of the data play a major role in data analysis. In many applications, ignoring the records with missing values may adversely affect the prediction process and creates a significant bias in the resulting data. Therefore, Missing Data Imputation (MDI) has become mandatory to tackle the negative consequences of the presence of missing data. However, different features show different behaviours to data imputation, as the imputation of some features can enhance the learning process while others may lead to worse results according to the feature properties. This paper proposes the use of evolutionary algorithms to evaluate the usefulness of the imputation for each feature on the performance of the prediction model, in order to select the best subset of incomplete features that can enhance the learning process and maximize the prediction power of the model after it has been handled properly. This paper proposes a new approach for handling missing values while performing feature selection simultaneously to enhance the model’s learning performance and reduce the negative consequences of imputation. The performance of the proposed method was evaluated using 10 bench-marking datasets under 10-folds cross validation test. The results were compared with five classical imputation methods (mean, median, multiple imputation, expectation maximization, and K-nearest neighbours). The proposed methodology significantly outperformed other methods in terms of accuracy, sensitivity, specificity, geometric means, and the area under the curve. Moreover, the effectiveness of the proposed method was compared against three recent evolutionary based imputation methods, where the proposed methodology outperformed other methods in terms of accuracy in 75% of the datasets.

论文关键词:Missing data,Data imputation,Evolutionary algorithms

论文评审过程:Received 15 January 2021, Revised 8 November 2021, Accepted 9 November 2021, Available online 30 November 2021, Version of Record 13 December 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107734