How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?
作者:
Highlights:
• Cross-dataset model generalization for abusive language.
• Generalization of BERT, ALBERT and fastText models with respect to abusive language datasets.
• Experiments covering nine widely used public abusive speech datasets.
• Prediction of generalization by applying a random forest model.
摘要
•Cross-dataset model generalization for abusive language.•Generalization of BERT, ALBERT and fastText models with respect to abusive language datasets.•Experiments covering nine widely used public abusive speech datasets.•Prediction of generalization by applying a random forest model.
论文关键词:00-01,99-00,Hate speech,Offensive language,Classification,Generalization
论文评审过程:Received 9 August 2020, Revised 11 November 2020, Accepted 20 January 2021, Available online 9 February 2021, Version of Record 9 February 2021.
论文官网地址:https://doi.org/10.1016/j.ipm.2021.102524