Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets

作者:

Highlights:

• Sentiment analysis of tweets by a state-of-the-art classification model (BERT).

• Evaluation of tweet pre-processing, to avoid noise and exploit hidden information.

• Available data in two languages are considered, i.e., English and Italian.

• The most convenient strategy to pre-process tweets is individuated.

• The state of the art is improved in both languages for tweet sentiment analysis.

摘要

•Sentiment analysis of tweets by a state-of-the-art classification model (BERT).•Evaluation of tweet pre-processing, to avoid noise and exploit hidden information.•Available data in two languages are considered, i.e., English and Italian.•The most convenient strategy to pre-process tweets is individuated.•The state of the art is improved in both languages for tweet sentiment analysis.

论文关键词:Sentiment analysis,Pre-processing,Twitter,English,Italian

论文评审过程:Received 2 November 2020, Revised 12 April 2021, Accepted 22 April 2021, Available online 30 April 2021, Version of Record 25 May 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115119