Towards efficiently mining closed high utility itemsets from incremental databases
作者:
Highlights:
•
摘要
The set of closed high-utility itemsets (CHUIs) concisely represents the exact utility of all itemsets. Yet, it can be several orders of magnitude smaller than the set of all high-utility itemsets. Existing CHUI mining algorithms assume that databases are static, making them very expensive in the case of incremental data, since the whole dataset has to be processed for each batch of new transactions. To address this challenge, this paper presents the first approach, called IncCHUI, that mines CHUIs efficiently from incremental databases. In order to achieve this, we propose an incremental utility-list structure, which is built and updated with only one database scan. Further, we apply effective pruning strategies to fast construct incremental utility-lists and eliminate candidates that are not updated. Finally, we suggest an efficient hash-based approach to update or insert new closed sets that are found. Our extensive experimental evaluation on both real-life and synthetic databases shows the efficiency, as well as the feasibility of our approach. It significantly outperforms previously proposed methods that are mainly run in batch mode in terms of speed, and it is scalable with respect to the number of transactions.
论文关键词:High-utility itemset mining,Closed itemset mining,Incremental mining,Incremental utility list
论文评审过程:Received 16 June 2018, Revised 12 November 2018, Accepted 13 November 2018, Available online 16 November 2018, Version of Record 7 January 2019.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.11.019