Missing data imputation using decision trees and fuzzy clustering with iterative learning

作者:Sanaz Nikfalazar, Chung-Hsing Yeh, Susan Bedingfield, Hadi A. Khorshidi

摘要

Various imputation approaches have been proposed to address the issue of missing values in data mining and machine learning applications. To improve the accuracy of missing data imputation, this paper proposes a new method called DIFC by integrating the merits of decision tress and fuzzy clustering into an iterative learning approach. To compare the performance of the DIFC method against five effective imputation methods, extensive experiments are conducted on six widely used datasets with numerical and categorical missing data, and with various amounts and types of missing values. The experimental results show that the DIFC method outperforms other methods in terms of imputation accuracy. Further experiments on the effect of missing value types demonstrate the robustness of the DIFC method in dealing with different types of missing values. This paper contributes to missing data imputation research by providing an accurate and robust method.

论文关键词:Missing data imputation, Decision trees, Fuzzy clustering, Data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-019-01427-1