Approximating the number of unique values of an attribute without sorting

作者:

Highlights:

摘要

Counts of unique values are frequently needed information in database systems. Especially, they are essential in query optimization and physical database design. Traditionally, exact counts were obtained by sorting, which is an expensive operation. In this paper we present three algorithms for counting unique values by probabilistic methods. These algorithms require only one pass over the data, and produce approximations to the true count with certain standard deviations. For deviations acceptable in practical environments (~10%), the algorithms require only modest amounts of memory space and computation time. We have implemented all three algorithms in System R. We also present the results of the experiments on accuracy and performance of these algorithms.

论文关键词:

论文评审过程:Received 30 November 1985, Revised 26 June 1986, Available online 10 June 2003.

论文官网地址:https://doi.org/10.1016/0306-4379(87)90014-7