Detecting ethnicity-targeted hate speech in Russian social media texts
作者:
Highlights:
• We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;
• We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;
• In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;
• Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;
• We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.
摘要
•We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;•We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;•In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;•Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;•We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.
论文关键词:Hate speech detection,Ethnic hate,Russian language,Deep learning
论文评审过程:Received 30 October 2020, Revised 3 June 2021, Accepted 28 June 2021, Available online 21 July 2021, Version of Record 21 July 2021.
论文官网地址:https://doi.org/10.1016/j.ipm.2021.102674