An effective and efficient approach to classification with incomplete data
作者:
Highlights:
•
摘要
Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Using imputation to transform incomplete data into complete data is a common approach to classification with incomplete data. However, simple imputation methods are often not accurate, and powerful imputation methods are usually computationally intensive. A recent approach to handling incomplete data constructs an ensemble of classifiers, each tailored to a known pattern of missing data. The main advantage of this approach is that it can classify new incomplete instances without requiring any imputation. This paper proposes an improvement on the ensemble approach by integrating imputation and genetic-based feature selection. The imputation creates higher quality training data. The feature selection reduces the number of missing patterns which increases the speed of classification, and greatly increases the fraction of new instances that can be classified by the ensemble. The results of experiments show that the proposed method is more accurate, and faster than previous common methods for classification with incomplete data.
论文关键词:Incomplete data,Missing data,Classification,Imputation,Feature selection,Ensemble learning
论文评审过程:Received 17 January 2018, Revised 8 May 2018, Accepted 10 May 2018, Available online 26 May 2018, Version of Record 26 May 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.05.013