Multiple weak supervision for short text classification

作者:Li-Ming Chen, Bao-Xin Xiu, Zhao-Yun Ding

摘要

For short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.

论文关键词:Multiple weak supervision, Short text classification, Imbalanced classification, Distant supervision clustering, Probabilistic labels

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02958-3