Semi-automatic generation of multilingual datasets for stance detection in Twitter
作者:
Highlights:
• New method to semi-automatically build labeled stance detection datasets from Twitter.
• Translation strategies outperform zero-shot approaches when data is translated to a high-resourced language.
• User-based information helps to label individual tweets.
• Our method is applicable to quickly and cheaply generate labeled Twitter-based data.
摘要
•New method to semi-automatically build labeled stance detection datasets from Twitter.•Translation strategies outperform zero-shot approaches when data is translated to a high-resourced language.•User-based information helps to label individual tweets.•Our method is applicable to quickly and cheaply generate labeled Twitter-based data.
论文关键词:Stance detection,Multilingualism,Text categorization,Fake news,Deep learning
论文评审过程:Received 29 July 2020, Revised 4 December 2020, Accepted 24 December 2020, Available online 1 January 2021, Version of Record 11 January 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2020.114547