Statistical inference in retrieval effectiveness evaluation

作者：

Highlights：

•

摘要

Evaluation methodology, and particularly its statistical tests associated, plays a central role in the information retrieval domain which maintains a strong empirical tradition. In an effort to evaluate the retrieval effectiveness of a search algorithm, this paper focuses on the average precision over a set of fixed recall values. After reviewing traditional evaluation methodology through the use of examples, this study suggests applying another statistical inference methodology called bootstrap, within which no particular assumption is needed about the distribution of the observations. Moreover, this scheme may be used to assert the accuracy of virtually any statistic, to build approximate confidence interval, and to verify whether a statistically significant difference exists between two retrieval schemes, even when dealing with a relatively small sample size. This study also suggests selecting the sample median rather than the sample mean in evaluating retrieval effectiveness where the justification for this choice is based on the nature of the information retrieval data.

论文关键词：

论文评审过程：Received 14 June 1995, Accepted 1 April 1997, Available online 11 June 1998.

论文官网地址：https://doi.org/10.1016/S0306-4573(97)00027-7