An effective method for classification with missing values

作者:Sabit Anwar Zahin, Chowdhury Farhan Ahmed, Tahira Alam

摘要

Classification is one of the most important tasks in machine learning with a huge number of real-life applications. In many practical classification problems, the available information for making object classification is partial or incomplete because some attribute values can be missing due to various reasons. These missing values can significantly affect the efficacy of the classification model. So it is crucial to develop effective techniques to impute these missing values. A number of methods have been introduced for solving classification problem with missing values. However they have various problems. So, we introduce an effective method for imputing missing values using the correlation among the attributes. Other methods which consider correlation for imputing missing values works better either for categorical or numeric data, or designed for a particular application only. Moreover they will not work if all the records have at least one missing attribute. Our method, Model based Missing value Imputation using Correlation (MMIC), can effectively impute both categorical and numeric data. It uses an effective model based technique for filling the missing values attribute wise and reusing then effectively using the model. Extensive performance analyzes show that our proposed approach achieves high performance in imputing missing values and thus increases the efficacy of the classifier. The experimental results also show that our method outperforms various existing methods for handling missing data in classification.

论文关键词:Classification, Missing values, Imputation, Correlation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-018-1139-9