EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

作者:Jawad Khan, Aftab Alam, Jamil Hussain, Young-Koo Lee

摘要

With the rise of web 2.0, a huge amount of unstructured data has been generated on regular basis in the form of comments, opinions, etc. This unstructured data contains useful information and can play a significant role in business decision making. In this context, sentiment analysis (SA) is an active research area and has recently attracted the attention of the research community. The aim of SA is to classify the user-generated content into positive and negative class. State-of-the-art techniques for sentiment classification relies on the traditional bag-of-words approaches. Such approaches can be advantageous in terms of simplicity but completely ignore the semantics aspects, the order between words, and also leads to the curse of dimensionality. Researchers have also proposed semantic-based SA techniques in conjunction with word-order employing high order n-grams, part-of-speech (POS) patterns, and dependency relation features. But can every word or phrase of high order n-grams, POS patterns or dependency relation features represent sentiment clue? If incorporated, then what about the dimensionality? In order to tackle and investigate such issues, in this paper, we propose a novel POS and n-gram based ensemble method for SA while considering semantics, sentiment clue, and order between words called EnSWF which is a four phase process. Our main contributions are four-fold (a) Appropriate Feature Extraction: we investigate and validate extracting various appropriate features for sentiment classification. (b) Dimensionality Reduction: We decrease the dimensionality of feature space by selecting the subset of most meaningful and effective features. (c) Ensemble Model: We propose an ensemble learning method for both filter based features selection and classification using simple majority voting technique. (d) Practicality: we authenticate our claim while applying our model on benchmark datasets. We also show that EnSWF out-perform existing techniques in terms of classification accuracy and reduce high dimensional feature space.

论文关键词:Sentiment classification, Word order, Semantics, Sentiment clue, Feature extraction, Feature selection, Ensemble learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-019-01425-4