How to evaluate rankings of academic entities using test data

作者:

Highlights:

• Framework for computing significance values when ranking algorithms are compared.

• We analyse the stability and discriminative power of common evaluation measures.

• Average and median rank have high discriminative power and are stable measures.

• The nDCG measure only performs well when used on permille rankings.

摘要

•Framework for computing significance values when ranking algorithms are compared.•We analyse the stability and discriminative power of common evaluation measures.•Average and median rank have high discriminative power and are stable measures.•The nDCG measure only performs well when used on permille rankings.

论文关键词:Evaluating rankings,Test data,Cranfield paradigm,Significance testing

论文评审过程:Received 17 April 2018, Revised 1 June 2018, Accepted 7 June 2018, Available online 19 June 2018, Version of Record 19 June 2018.

论文官网地址:https://doi.org/10.1016/j.joi.2018.06.002