A mathematical model for estimating the effectiveness of bigram coding

作者：

Highlights：

•

摘要

This paper discusses bigram coding as a technique for compacting data. A mathematical model is developed that estimates the effectiveness of such a code as a function of the fraction of bigram tokens that are encodeable: this model accounts for the degree of overlap of encodeable tokens by assuming that bigram token occurrences have a Markov property. The model requires that a single parameter be fit to the data. The results of an experiment testing this model on a file of catalog data in a library is given, and excellent agreement is found. This model provides substantial improvement over an earlier model in which bigrams are assumed to occur independently of each other.

论文关键词：

论文评审过程：Available online 13 July 2002.

论文官网地址：https://doi.org/10.1016/0306-4573(76)90041-8