Polarity classification using structure-based vector representations of text

作者：

Highlights：

• We propose structure-based features for machine learning polarity classification.

• Adding our features to common word-based features significantly boosts performance.

• The most informative features capture the sentiment conveyed by rhetorical elements.

• Useful rhetorical elements form a text's core or provide crucial context information.

摘要

The exploitation of structural aspects of content is becoming increasingly popular in rule-based polarity classification systems. Such systems typically weight the sentiment conveyed by text segments in accordance with these segments' roles in the structure of a text, as identified by deep linguistic processing. Conversely, state-of-the-art machine learning polarity classifiers typically aim to exploit patterns in vector representations of texts, mostly covering the occurrence of words or word groups in these texts. However, since structural aspects of content have been shown to contain valuable information as well, we propose to use structure-based features in vector representations of text. We evaluate the usefulness of our novel features on collections of English reviews in various domains. Our experimental results suggest that, even though word-based features are indispensable to good polarity classifiers, structure-based sentiment information provides valuable additional guidance that can help significantly improve the polarity classification performance of machine learning classifiers. The most informative features capture the sentiment conveyed by specific rhetorical elements that constitute a text's core or provide crucial contextual information.

论文关键词：Sentiment analysis,Rhetorical structure,Machine learning,Support vector machines

论文评审过程：Received 23 June 2014, Revised 12 March 2015, Accepted 2 April 2015, Available online 12 April 2015.

论文官网地址：https://doi.org/10.1016/j.dss.2015.04.002