MVC—a preprocessing method to deal with missing values

作者:

Highlights:

摘要

Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method, MVC (Missing Values Completion), to improve performances of completion and also declarativity and interactions with the user for this problem. Such qualities will allow to use it for the data cleaning step of the Knowledge Discovery in Databases (KDD) process (U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery: an overview, in: Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, USA, 1996, pp. 1–36). The core of MVC, is the Robust Association Rules (RAR) algorithm that we have proposed earlier (A. Ragel, B Crémilleux, Treatment of missing values for association rules, in: Proceedings of the Second Pacific–Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, Lecture Notes in Artificial Intelligence 1394, Springer, Berlin, 1998, pp. 258–270). This algorithm extends the concept of association rules (R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, DC, USA, 1993, pp. 207–216) for databases with multiple missing values. It allows MVC to be an efficient preprocessing method: in our experiments with the c4.5 (J.R. Quilan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, USA, 1993) decision tree program, MVC has permitted to divide, up to two, the error rate in classification, independently of a significant gain of declarativity.

论文关键词:Association rules,Missing values,Preprocessing,Decision trees

论文评审过程:Received 1 March 1999, Accepted 17 March 1999, Available online 23 August 1999.

论文官网地址:https://doi.org/10.1016/S0950-7051(99)00022-2