Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations

作者:

Highlights:

• We systematically compare the methods on evaluation without relevance judgements.

• We study the combination of such methods, which has not been investigated so far.

• Our experiments show that simple data fusion combinations are not effective.

• More sophisticated solutions, based on machine learning, are effective and stable.

• A machine-learning combination of methods should be used, not a single one.

摘要

•We systematically compare the methods on evaluation without relevance judgements.•We study the combination of such methods, which has not been investigated so far.•Our experiments show that simple data fusion combinations are not effective.•More sophisticated solutions, based on machine learning, are effective and stable.•A machine-learning combination of methods should be used, not a single one.

论文关键词:Information retrieval evaluation,Automatic evaluation,Machine learning,Topic difficulty

论文评审过程:Received 1 March 2019, Revised 27 August 2019, Accepted 20 October 2019, Available online 27 November 2019, Version of Record 27 November 2019.

论文官网地址:https://doi.org/10.1016/j.ipm.2019.102149