Performance standards and evaluations in IR test collections: Cluster-based retrieval models

作者:

Highlights:

摘要

Low performance standards for the group of queries in 13 retrieval test collections have been computed. Derived from the random graph hypothesis, these standards represent the highest levels of retrieval effectiveness that can be obtained from meaningless clustering structures. Operational levels of cluster-based performance reported in selected sources during the past 20 years have been compared to the standards. Comparisons show that typical levels of operational cluster-based retrieval can be explained on the basis of chance. Indeed, most operational results in retrieval test collections are lower than those predicted by random graph theory. A tentative explanation for the poor performance of cluster-based retrieval reveals weaknesses in both fundamental assumptions and operational implementations. The cluster hypothesis offers no guarantee that relevant documents are naturally grouped together, clustering algorithms may not reveal the inherent structure in a set of documents, and retrieval strategies do not reliably retrieve the most effective cluster or clusters of documents. That most cluster-based retrieval implementations implicitly rely on topical relatedness to be equivalent to a relevance relationship contributes to the poor performance. Clustering strategies capable of adapting to relevance information may succeed where static clustering techniques have failed.

论文关键词:

论文评审过程:Available online 11 June 1998.

论文官网地址:https://doi.org/10.1016/S0306-4573(96)00043-X