Semi-automatic generation of multilingual datasets for stance detection in Twitter

作者:

Highlights:

• New method to semi-automatically build labeled stance detection datasets from Twitter.

• Translation strategies outperform zero-shot approaches when data is translated to a high-resourced language.

• User-based information helps to label individual tweets.

• Our method is applicable to quickly and cheaply generate labeled Twitter-based data.

摘要

•New method to semi-automatically build labeled stance detection datasets from Twitter.•Translation strategies outperform zero-shot approaches when data is translated to a high-resourced language.•User-based information helps to label individual tweets.•Our method is applicable to quickly and cheaply generate labeled Twitter-based data.

论文关键词:Stance detection,Multilingualism,Text categorization,Fake news,Deep learning

论文评审过程:Received 29 July 2020, Revised 4 December 2020, Accepted 24 December 2020, Available online 1 January 2021, Version of Record 11 January 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.114547