Comparisons of classification methods in the original and pattern spaces

作者:

Highlights:

摘要

The logical analysis of data (LAD) is one of the most promising data mining and machine learning techniques developed to date for extracting knowledge from data. The LAD is based on the concepts of combinatorics, optimization, and Boolean functions. The key feature of the LAD is the capability of detecting hidden patterns in the data. Since patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other class. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors. The patterns are also interpretable and can serve as an essential tool for understanding the problem. These desirable properties of the patterns generated from the LAD motivate the use of the LAD patterns as input variables to other classification techniques to achieve a more stable and accurate performance. In this paper, the patterns generated from the LAD are used as the input variables to the decision tree and k-nearest neighbor classification methods. The applicability and usefulness of the LAD patterns for classification are investigated experimentally. The classification accuracy and sensitivity of the classification results for different classifiers in the original and pattern spaces are compared using several public data. The experimental results show that classifications in the pattern space can yield better and stable performance than those in the original space in terms of accuracy when the classification accuracy of the LAD is relatively good (i.e., the LAD patterns are of good quality), the ratio of the number of patterns to the total number of attributes is small, or the data set for classification is balanced between two classes.

论文关键词:Logical analysis of data (LAD),Patterns,Classification,Pattern-based classification

论文评审过程:Available online 12 April 2011.

论文官网地址:https://doi.org/10.1016/j.eswa.2011.04.024