Evaluating regression algorithms at the instance level using item response theory
作者:
Highlights:
•
摘要
Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.
论文关键词:Item response theory,Student ability,Regression tasks,Machine learning
论文评审过程:Received 4 June 2021, Revised 23 December 2021, Accepted 24 December 2021, Available online 4 January 2022, Version of Record 20 January 2022.
论文官网地址:https://doi.org/10.1016/j.knosys.2021.108076