Pattern selection approaches for the logical analysis of data considering the outliers and the coverage of a pattern
作者:
Highlights:
•
摘要
The logical analysis of data (LAD) is one of the most promising data mining methods developed to date for extracting knowledge from data. The key feature of the LAD is the capability of detecting hidden patterns in the data. Because patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors.The LAD technique, however, tends to choose too many patterns by solving a set covering problem to build a classifier; this is especially the case when outliers exist in the data set. In the set covering problem of the LAD, each observation should be covered by at least one pattern, even though the observation is an outlier. Thus, existing approaches tend to select too many patterns to cover these outliers, resulting in the problem of overfitting. Here, we propose new pattern selection approaches for LAD that take both outliers and the coverage of a pattern into account. The proposed approaches can avoid the problem of overfitting by building a sparse classifier. The performances of the proposed pattern selection approaches are compared with existing LAD approaches using several public data sets. The computational results show that the sparse classifiers built on the patterns selected by the proposed new approaches yield an improved classification performance compared to the existing approaches, especially when outliers exist in the data set.
论文关键词:Classification,Logical analysis of data,Pattern selection,Set covering problem
论文评审过程:Available online 3 May 2011.
论文官网地址:https://doi.org/10.1016/j.eswa.2011.04.189