A new framework for detecting weighted sequential patterns in large sequence databases

作者:

Highlights:

摘要

Sequential pattern mining is an essential research topic with broad applications which discovers the set of frequent subsequences satisfying a support threshold in a sequence database. The major problems of mining sequential patterns are that a huge set of sequential patterns are generated and the computation time is so high. Although efficient algorithms have been developed to tackle these problems, the performance of the algorithms dramatically degrades in case of mining long sequential patterns in dense databases or using low minimum supports. In addition, the algorithms may reduce the number of patterns but unimportant patterns are still found in the result patterns. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining. Previous sequential mining algorithms treat sequential patterns uniformly while real sequential patterns have different importance. In our approach, the weights of items are given according to the priority or importance. During the mining process, we consider not only supports but also weights of patterns. Based on the framework, we present a weighted sequential pattern mining algorithm (WSpan). To our knowledge, this is the first work to mine weighted sequential patterns. The experimental results show that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.

论文关键词:Data mining,Knowledge discovery,Weighted sequential pattern mining,Weighted support,Minimum weight

论文评审过程:Received 29 August 2006, Revised 7 February 2007, Accepted 9 April 2007, Available online 19 April 2007.

论文官网地址:https://doi.org/10.1016/j.knosys.2007.04.002