Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets
作者:
Highlights:
• Sentiment analysis of tweets by a state-of-the-art classification model (BERT).
• Evaluation of tweet pre-processing, to avoid noise and exploit hidden information.
• Available data in two languages are considered, i.e., English and Italian.
• The most convenient strategy to pre-process tweets is individuated.
• The state of the art is improved in both languages for tweet sentiment analysis.
摘要
•Sentiment analysis of tweets by a state-of-the-art classification model (BERT).•Evaluation of tweet pre-processing, to avoid noise and exploit hidden information.•Available data in two languages are considered, i.e., English and Italian.•The most convenient strategy to pre-process tweets is individuated.•The state of the art is improved in both languages for tweet sentiment analysis.
论文关键词:Sentiment analysis,Pre-processing,Twitter,English,Italian
论文评审过程:Received 2 November 2020, Revised 12 April 2021, Accepted 22 April 2021, Available online 30 April 2021, Version of Record 25 May 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2021.115119