Parallel frequent itemset mining using systolic arrays
作者:
Highlights:
•
摘要
Since extraction of frequent itemsets from a transaction database is crucial to several data mining tasks such as association rule generation, so frequent itemset mining is one of the most important concepts in data mining. One of the major problems in frequent itemset mining is the explosion of the number of results which is directly effecting on the execution time of itemset mining algorithms. To address this problem, closed itemsets have been proposed, which provides concise lossless representations of the original collection of frequent itemsets. Henceforth, the frequencies of all itemsets in the original collection can be reconstructed from the reduced collection. However, the reduction provided by this exact method is not sufficient to solve the pattern explosion problem, mainly because of high dimensional datasets which have large number of items in each transaction. Colossal itemset mining is another solution to reduce the output size which will not be useful if the set of all frequent itemsets have been required. Higher level of performance improvement can be obtained from efficient scalable parallel mining methods. In this paper we represent an efficient scalable parallel algorithm using systolic arrays to conduct mining of frequent itemsets in very large, such as high dimensional, datasets. In our algorithm, we use a bit matrix to compress the dataset and mapping the mining algorithm on the systolic arrays architecture. For this purpose, each transaction of dataset represents as a row in the bit matrix. We use this bit matrix structure to model the pattern mining as a systolic array problem. Our experimental results and performance study show that this algorithm outperforms substantially the best previously developed parallel algorithms.
论文关键词:Frequent pattern mining,Itemset mining,Systolic arrays,Bit matrix,Parallel pattern mining
论文评审过程:Received 1 April 2012, Revised 11 September 2012, Accepted 17 September 2012, Available online 1 October 2012.
论文官网地址:https://doi.org/10.1016/j.knosys.2012.09.005