Compression of correlated bit-vectors

作者:

Highlights:

摘要

Bitmaps are data structures occurring often in information retrieval. They are useful, but are also large and expensive to store. For this reason, considerable effort has been devoted to finding techniques for compressing them. These techniques are most effective for sparse bitmaps. We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed. The clustering method efficiently generates a graph structure on the bitmaps. In some situations, it is desired to impose restrictions on the graph; finding the optimal graph satisfying these restrictions is shown to be NP-complete. The results of applying our algorithm to the Bible is presented: for some sets of bitmaps, our method almost doubled in the compression savings.

论文关键词:Data compression,bitmap compression,data storage,maximum spanning tree:application,clustering:application

论文评审过程:Received 24 August 1990, Revised 30 January 1991, Available online 17 June 2003.

论文官网地址:https://doi.org/10.1016/0306-4379(91)90030-D