An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix

作者:Mohammad Karim Sohrabi

摘要

High utility itemset mining is an important extension of frequent itemset mining which considers unit profits and quantities of items as external and internal utilities, respectively. Since the utility function has not downward closure property, an overestimated value of utility is obtained using an anti-monotonic upper bound of utility function to prune the search space and improve the efficiency of high utility itemset mining methods. Transaction-weighted utilization (TWU) of itemset was the first and one of the most important functions which has been used as the anti-monotonic upper bound of utility by various algorithms. A variety of high utility itemset mining methods have attempted to tighten the utility upper bound and have exploited appropriate pruning strategies to improve mining efficiency. Although TWU and its improved alternatives have attempted to increase the efficiency of high utility itemset mining methods by pruning their search spaces, they suffer from a significant number of generated candidates which are high-TWU but are not high utility itemsets. Calculating the actual utilities of low utility candidates needs to multiple scanning of the dataset and thus imposes a huge overhead to the mining methods, which can cause to lose the pruning benefits of the upper bounds. Proposing appropriate pruning strategies, exploiting efficient data structures, and using tight anti-monotonic upper bounds can overcome this problem and lead to significant performance improvement in high utility itemset mining methods. In this paper, a new projection-based method, called MAHI (matrix-aided high utility itemset mining), is introduced which uses a novel utility matrix-based pruning strategy, called MA-prune to improve the high utility itemset mining performance in terms of execution time. The experimental results show that MAHI is faster than former algorithms.

论文关键词:Frequent itemset mining, High utility itemset mining, Itemset, Transaction-weighted utility, Pruning strategy

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-020-01485-w