Efficient mining of sequential patterns with time constraints: Reducing the combinations

作者:

Highlights:

摘要

In this paper we consider the problem of discovering sequential patterns by handling time constraints as defined in the Gsp algorithm. While sequential patterns could be seen as temporal relationships between facts embedded in the database where considered facts are merely characteristics of individuals or observations of individual behavior, generalized sequential patterns aim to provide the end user with a more flexible handling of the transactions embedded in the database. We thus propose a new efficient algorithm, called Gtc (Graph for Time Constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the data mining process can be highly beneficial. One of the most significant new feature of our approach is that handling of time constraints can be easily taken into account in traditional levelwise approaches since it is carried out prior to and separately from the counting step of a data sequence. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.

论文关键词:Time constraints,Sequential patterns,Levelwise algorithms

论文评审过程:Available online 9 February 2008.

论文官网地址:https://doi.org/10.1016/j.eswa.2008.01.021