Mining probabilistic automata: a statistical view of sequential pattern mining

作者:Stéphanie Jacquemont, François Jacquenet, Marc Sebban

摘要

During the past decade, sequential pattern mining has been the core of numerous research efforts. It is now possible to efficiently extract knowledge of users’ behavior from a huge set of sequences collected over time. This has applications in various domains such as purchases in supermarkets, Web site visits, etc. However, sequence mining algorithms do little to control the risks of extracting false discoveries or overlooking true knowledge. In this paper, the theoretical conditions to achieve a relevant sequence mining process are examined. Then, the article offers a statistical view of sequence mining which has the following advantages: First, it uses a compact and generalized representation of the original sequences in the form of a probabilistic automaton. Second, it integrates statistical constraints to guarantee the extraction of significant patterns. Finally, it provides an interesting solution in a privacy preserving context in order to respect individuals’ information. An application in car flow modeling is presented, showing the ability of our algorithm (acsm) to discover frequent routes without any private information. Comparisons with a classical sequence mining algorithm (spam) are made, showing the effectiveness of our approach.

论文关键词:Sequence mining, Probabilistic automaton, Privacy preserving data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-008-5098-y