Mining frequent tree-like patterns in large datasets

作者：

Highlights：

•

摘要

Sequential pattern mining is crucial to data mining domains. This paper proposes a novel data mining approach for exploring hierarchical tree structures, named tree-like patterns, representing the relationships for a pair of items in a sequence. Using tree-like patterns, the relationships for a pair of items can be identified in terms of the cause and effect. A novel technique that efficiently counts support values for tree-like patterns using a queue structure is proposed. In addition, this paper addresses an efficient scheme for determining the frequency of a tree-like pattern in a sequence using a dynamic programming approach. Each tree-like pattern embedded in a sequence is considered to have a certain valuable meaning or the degree of importance used in different applications. Two addressed formulas are applied to determine the degree of significance for a specific sequence, which denotes the degree of consecutive items in a tree-like pattern for a sequence. The larger the degree of significance a tree-like pattern has, the more the tree-like pattern is compacted in the sequence. The characteristics differentiating the explored patterns from those obtained with other schemes are discussed. A simulation analysis of the proposed data mining approach is utilized to demonstrate its efficacy. Finally, the proposed approach is designed and implemented in a data mining system integrated into a novel e-learning platform.

论文关键词：Data mining,Frequent patterns,Sequential patterns,Tree-like patterns,World wide web

论文评审过程：Received 4 September 2004, Accepted 6 July 2006, Available online 7 August 2006.

论文官网地址：https://doi.org/10.1016/j.datak.2006.07.003