Association rules mining using heavy itemsets

作者:

Highlights:

摘要

A well-known problem that limits the practical usage of association rule mining algorithms is the extremely large number of rules generated. Such a large number of rules makes the algorithms inefficient and makes it difficult for the end users to comprehend the discovered rules. We present the concept of a heavy itemset. An itemset A is heavy (for given support and confidence values) if all possible association rules made up of items only in A are present. We prove a simple necessary and sufficient condition for an itemset to be heavy. We present a formula for the number of possible rules for a given heavy itemset, and show that a heavy itemset compactly represents an exponential number of association rules. Along with two simple search algorithms, we present an efficient greedy algorithm to generate a collection of disjoint heavy itemsets in a given transaction database. We then present a modified apriori algorithm that starts with a given collection of disjoint heavy itemsets and discovers more heavy itemsets, not necessarily disjoint with the given ones.

论文关键词:Association rules,Data mining,Knowledge discovery in databases,Knowledge compression

论文评审过程:Received 9 November 2005, Revised 17 February 2006, Accepted 18 April 2006, Available online 2 June 2006.

论文官网地址:https://doi.org/10.1016/j.datak.2006.04.009