On the choice of the best imputation methods for missing values considering three groups of classification methods

作者:Julián Luengo, Salvador García, Francisco Herrera

摘要

In real-life data, information is frequently lost in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on a classification task with twenty-three classification methods and fourteen different imputation approaches to missing values treatment that are presented and analyzed. The analysis involves a group-based approach, in which we distinguish between three different categories of classification methods. Each category behaves differently, and the evidence obtained shows that the use of determined missing values imputation methods could improve the accuracy obtained for these methods. In this study, the convenience of using imputation methods for preprocessing data sets with missing values is stated. The analysis suggests that the use of particular imputation methods conditioned to the groups is required.

论文关键词:Approximate models, Classification, Imputation, Rule induction learning, Lazy learning, Missing values, Single imputation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0424-2