SPEck: mining statistically-significant sequential patterns efficiently with exact sampling

作者:Steedman Jenkins, Stefan Walzer-Goldfeld, Matteo Riondato

摘要

We study the problem of efficiently mining statistically-significant sequential patterns from large datasets, under different null models. We consider one null model presented in the literature, and introduce two new ones that preserve different properties of the observed dataset. We describe SPEck, a generic framework for significant sequential pattern mining, that can be instantiated with any null model, when given a procedure for sampling datasets according to the null distribution. For the previously-proposed model, we introduce a novel procedure that samples exactly according to the null distribution, while existing procedures are approximate samplers. Our exact sampler is also more computationally efficient and much faster in practice. For the null models we introduce, we give exact and/or almost uniform samplers. Our experimental evaluation shows how exact samplers can be orders of magnitude faster than approximate ones, and scale well.

论文关键词:Hypothesis testing, Significant Pattern Mining, Statistically-sound Knowledge Discovery, Transactional datasets, Lightly smoked ham

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-022-00848-x