Optimized cardinality-based generalized itemset mining using transaction ID and numeric encoding

作者:Bac Le, Phuc Luong

摘要

In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.

论文关键词:Generalized itemset, Cardinality constraints, Optimization, Closed itemset, Maximal itemset

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-017-1058-1