Binarised regression tasks: methods and evaluation metrics

作者:José Hernández-Orallo, Cèsar Ferri, Nicolas Lachiche, Adolfo Martínez-Usó, M. José Ramírez-Quintana

摘要

Some supervised tasks are presented with a numerical output but decisions have to be made in a discrete, binarised, way, according to a particular cutoff. This binarised regression task is a very common situation that requires its own analysis, different from regression and classification—and ordinal regression. We first investigate the application cases in terms of the information about the distribution and range of the cutoffs and distinguish six possible scenarios, some of which are more common than others. Next, we study two basic approaches: the retraining approach, which discretises the training set whenever the cutoff is available and learns a new classifier from it, and the reframing approach, which learns a regression model and sets the cutoff when this is available during deployment. In order to assess the binarised regression task, we introduce context plots featuring error against cutoff. Two special cases are of interest, the \( UCE \) and \( OCE \) curves, showing that the area under the former is the mean absolute error and the latter is a new metric that is in between a ranking measure and a residual-based measure. A comprehensive evaluation of the retraining and reframing approaches is performed using a repository of binarised regression problems created on purpose, concluding that no method is clearly better than the other, except when the size of the training data is small.

论文关键词:Regression, Classification, Reframing, Mean absolute error, Cutoff, Binarisation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-015-0443-9