Efficiently mining frequent itemsets with weight and recency constraints
作者:Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong
摘要
In the past, a novel framework named recent weighted frequent itemset mining (RWFIM) and two projection-based algorithms, RWFIM-P and RWFIM-PE, were proposed to consider both the relative importance of items (item weights) and the recency of patterns. However, the projection-and-test mechanism used by these algorithms to discover recent weighted frequent itemsets (RWFIs) in a recursive way may have poor performance when the database is dense or contains long transactions. To address this issue, an efficient tree-based RWFI-Mine algorithm is proposed in this paper for mining RWFIs, which considers both weight and the recency of patterns. A novel Set-enumeration tree called the recent weighted frequent (RWF)-tree and a sorted downward closure property of RWFIs for the RWF-tree are proposed. Moreover, two data structures, named element (E)-table and recent weighted frequent (RWF)-table, are designed to store the information needed for discovering RWFIs. RFWI-Mine discovers RWFIs in a recursive way without candidate generation, thus reducing the computational costs and memory requirements for mining RWFIs. A second improved algorithm named RWFI-EMine algorithm is further proposed to avoid building E-tables and RWF-tables for unpromising itemsets and their child nodes by adopting the Estimated Weight of 2-itemset Pruning (EW2P) strategy. Extensive experiments are conducted on several real-world and synthetic datasets to evaluate the performance of the two proposed algorithms, and the ratio between the number of generated RWFIs and WFIs. Results show that the proposed algorithms outperform not only the traditional PWA algorithm for WFIM, but also the state-of-the-art RWFIM-P and RWFIM-PE algorithms for RWFIM, in terms of runtime, memory usage and scalability.
论文关键词:Temporal database, Weighted frequent itemsets, Recency constraint, Set-enumeration RWF-tree, EW2P strategy
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-017-0915-2