A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

作者:

Highlights:

• Experimental comparison of sixteen preprocessing techniques for Sentiment Analysis.

• Use of two Twitter datasets and four popular machine learning algorithms.

• Evaluation of the techniques’ resulting classification accuracy.

• Lemmatization, number removal, and contractions’ replacement increase accuracy.

• Ablation and combination study was executed to check interactions among techniques.

摘要

•Experimental comparison of sixteen preprocessing techniques for Sentiment Analysis.•Use of two Twitter datasets and four popular machine learning algorithms.•Evaluation of the techniques’ resulting classification accuracy.•Lemmatization, number removal, and contractions’ replacement increase accuracy.•Ablation and combination study was executed to check interactions among techniques.

论文关键词:Sentiment analysis,Text pre-processing,Machine learning,Text classification,Ablation study,Combination study

论文评审过程:Received 28 December 2017, Revised 21 May 2018, Accepted 8 June 2018, Available online 15 June 2018, Version of Record 18 June 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.06.022