Detecting ethnicity-targeted hate speech in Russian social media texts

Highlights：

• We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;

• We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;

• In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;

• Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;

• We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.

摘要

•We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;•We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;•In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;•Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;•We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.