Efficient frequent itemset mining methods over time-sensitive streams

作者:

Highlights:

摘要

Stream data arrives dynamically and rapidly, and the characteristics cannot be reflected by the traditional transaction-based sliding window; thus, the mining results are inaccurate. This paper focuses on this problem and constructs a timestamp-based sliding window model, which can be further converted into a transaction-based sliding window. Based on this model, an extended enumeration tree is developed to incrementally maintain the essential information. In our proposed frequent itemset mining algorithm, we introduce the type transforming bound to dynamically classify the itemsets into categories; thus, certain itemset processing can be deferred or ignored, that is, an itemset will not be handled unless its type transforming bounds reach a threshold; as a result, the computational pruning can be conducted. Nevertheless, it only guarantees the conditions to obtain accurate results, and thus cannot achieve the best performance. This problem is further improved in our approximate mining algorithm, in which we propose a heuristic rule-based strategy. Additionally, it can save more computational cost with a tolerable mining error. Theoretical analysis and experimental studies demonstrate that our proposed algorithms have high accuracy, spend less computational time and memory, and significantly outperform the baseline method and state-of-the-art algorithms.

论文关键词:Stream,Frequent itemset,Data mining,Association rules,Time-sensitive

论文评审过程:Received 11 May 2013, Revised 28 October 2013, Accepted 2 December 2013, Available online 13 December 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.12.001