Well-calibrated confidence measures for multi-label text classification with a large number of labels

作者:

Highlights:

• We propose a novel approach to address the computationally demanding nature of the Label Powerset (LP) Inductive Conformal Prediction (ICP) multi-label classification with a high number of labels. We mathematically establish the validity of the proposed approach and provide experimental results that highlight its computational efficiency.

• We present prediction set results for data-sets in for multi-label text classification problems where it was previously computationally challenging and show that can be practically useful.

• Results show that Bert classifier surpasses the non-contextualised based by a large margin. In addition, to the best of our knowledge, our bert implementation achieved state-of-the-art results in the data-sets used.

摘要

•We propose a novel approach to address the computationally demanding nature of the Label Powerset (LP) Inductive Conformal Prediction (ICP) multi-label classification with a high number of labels. We mathematically establish the validity of the proposed approach and provide experimental results that highlight its computational efficiency.•We present prediction set results for data-sets in for multi-label text classification problems where it was previously computationally challenging and show that can be practically useful.•Results show that Bert classifier surpasses the non-contextualised based by a large margin. In addition, to the best of our knowledge, our bert implementation achieved state-of-the-art results in the data-sets used.

论文关键词:Text classification,Multi-label,Word2vec,Bert,Conformal prediction,Label powerset,Computational efficiency,Nonconformity measure,Confidence measure

论文评审过程:Received 23 September 2020, Revised 27 May 2021, Accepted 3 June 2021, Available online 21 August 2021, Version of Record 31 August 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108271