ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs

作者:

Highlights:

摘要

Sentiment analysis is about classifying opinions expressed in text. The aim of this study is to improve polarity classification of sentiments in microblogs by building adaptive sentiment lexicons. In the proposed method, corpora-based and lexicon-based approaches are combined and lexicons are generated from text. The sentiment classification is formulated as an optimization problem, in which the goal is to find optimum sentiment lexicons. A novel genetic algorithm is then proposed to solve this optimization problem and find lexicons to classify text. The algorithm generates adaptive sentiment lexicons, and then a meta-level feature is extracted based on it, which is then used alongside Bing Liu's lexicon and n-gram features. The experiments are conducted on six datasets. In terms of accuracy, the results outperform the state-of-the-art methods proposed in the literature in two of the datasets. Also, in four of the datasets, the proposed approach outperforms in terms of F-measure. Applying the proposed method on six datasets, the accuracy is higher than 80% in all six datasets and the F-measure is higher than 80% in four of these datasets. Using the sentiment lexicons created by the proposed algorithm, one can get a better understanding of the specific language and culture of Twitter users and sentiment orientation of words in different contexts. It is also shown that it is useful not to omit the conventional stop-words, as each word can have its sentimental implications.

论文关键词:Sentiment analysis,Genetic algorithm,Twitter,Sentiment lexicon,Social media,Evolutionary computation

论文评审过程:Received 9 May 2016, Revised 16 January 2017, Accepted 20 January 2017, Available online 22 January 2017, Version of Record 27 February 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.01.028