High-utility and diverse itemset mining

作者:Amit Verma, Siddharth Dawar, Raman Kumar, Shamkant Navathe, Vikram Goyal

摘要

High-utility Itemset Mining (HUIM) finds patterns from a transaction database with their utility no less than a user-defined threshold. The utility of an itemset is defined as the sum of the utilities of its items. The utility notion enables a data analyst to associate a profit score with each item and thereof to a pattern. We extend the notion of high-utility with diversity to define a new pattern type called High-utility and Diverse pattern (HUD). The notion of diversity of a pattern captures the extent of the different categories covered by the selected items in the pattern. An application of diverse-pattern lies in the recommendation task where a system can recommend to a customer a set of items from a new class based on her previously bought items. Our notion of diversity is easy to compute and also captures the basic essence of a previously proposed diversity notion. The existing algorithm to compute frequent-diverse patterns is 2-phase, i.e., in the first phase, frequent patterns are computed, out of which diverse patterns are filtered out in the second phase. We, in this paper, give an integrated algorithm that efficiently computes high-utility and diverse patterns in a single phase. Our experimental study shows that our proposed algorithm is very efficient as compared to a 2-phase algorithm that extracts high-utility itemsets in the first phase and filters out the diverse itemsets in the second phase.

论文关键词:Diverse patterns, Diversity measure, High-utility pattern mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-02063-x