Text signatures by superimposed coding of letter triplets and quadruplets

作者:

Highlights:

摘要

Text signatures are a condensed, coded form of a text; due to the reduced length, information is retrieved faster than with the full text if inverted files are not available. It has been proposed to base a particular form of signatures, the superimposed coding, on letter triplets (or quadruplets) rather than on complete words admitting in this way the masking of searchwords. This situation is analyzed here theoretically considering the unequal occurrence probabilities of the triplets; the results are compared with a set of experiments. It turns out that the signatures based on letter triplets produce too many false associations since the triplets occur in words other than the searchword. With quadruplets, the number of false associations might be tolerable.

论文关键词:

论文评审过程:Received 20 January 1986, Revised 26 August 1986, Available online 17 June 2003.

论文官网地址:https://doi.org/10.1016/0306-4379(87)90038-X