Experiments with document components for indexing and retrieval

作者:

Highlights:

摘要

The use of document components—such as a term, a sentence, or a whole document — for indexing and retrieval has been investigated employing two medium-size collections. A number of probabilistic similarity measures based on document components are studied, as well as a new method of handling probability estimates involving small sample sizes. It is seen that some of the new similarity measures can provide comparable performance to those methods studied by other investigators. In general, the term and sentence modes result in substantially equal performance, and both are superior to the document mode. However, it may be necessary to use longer documents in order to reveal fully the usefulness of the sentence mode, because documents in our databases may not have sufficient numbers of sentences.

论文关键词:

论文评审过程:Received 31 July 1987, Revised 15 October 1987, Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(88)90044-1