The Panda framework for comparing patterns

作者:

Highlights:

摘要

Data Mining techniques are commonly used to extract patterns, like association rules and decision trees, from huge volumes of data. The comparison of patterns is a fundamental issue, which can be exploited, among others, to synthetically measure dissimilarities in evolving or different datasets and to compare the output produced by different data mining algorithms on a same dataset. In this paper, we present the Panda framework for computing the dissimilarity of both simple and complex patterns, defined upon raw data and other patterns, respectively. In Panda the problem of comparing complex patterns is decomposed into simpler sub-problems on the component (simple or complex) patterns and so-obtained partial solutions are then smartly aggregated into an overall dissimilarity score. This intrinsically recursive approach grants Panda with a high flexibility and allows it to easily handle patterns with highly complex structures. Panda is built upon a few basic concepts so as to be generic and clear to the end user. We demonstrate the generality and flexibility of Panda by showing how it can be easily applied to a variety of pattern types, including sets of itemsets and clusterings.

论文关键词:Pattern comparison,Pattern base management systems,Data models,Knowledge discovery

论文评审过程:Received 10 July 2008, Revised 3 October 2008, Accepted 4 October 2008, Available online 25 October 2008.

论文官网地址:https://doi.org/10.1016/j.datak.2008.10.004