Probabilistic design principles for conventional and full-text retrieval systems
作者:
Highlights:
•
摘要
In order for conventionally designed commercial document retrieval systems to perform perfectly, the following two (logical) conditions must be satisfied for every search: (1) There exists a document property (or combination of properties) that belongs to those (and only those) documents that are relevant. (2) That property (or combination of properties) can be correctly guessed by the searcher. In general, the first assumption is false, and the second is impossible to satisfy; hence no conventional IR system can perform at a maximum level of effectiveness. (We are painfully aware of the current poor performance values for Recall and Precision. Furthermore, Recall deteriorates rapidly as document corpora continue to grow in size.) However, different design principles can lead to improved performance. This article presents a view of the document retrieval problem that shows that since the relationship between document properties (whether they be humanly assigned index terms or words that occur in the running text) and relevance is at best probabilistic, one should approach the design problem using probabilistic principles. It turns out that a front end designed to permit searchers to attach probabilistically interpreted weights to their query terms could be adapted for conventional IR systems. Such an enhancement could lead to improved performance.
论文关键词:
论文评审过程:Available online 13 July 2002.
论文官网地址:https://doi.org/10.1016/0306-4573(88)90092-1