Approximating the number of frequent sets in dense data

作者:Mario Boley, Henrik Grosskreutz

摘要

We investigate the problem of counting the number of frequent (item)sets—a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we show that the algorithm still has the desired accuracy on several real-world datasets when its running time is capped polynomially.

论文关键词:Data mining, Frequent itemsets, Approximate counting, Markov chain Monte Carlo

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-009-0212-4