A reinforced active learning approach for optimal sampling in aspect term extraction for sentiment analysis

作者:

Highlights:

摘要

Aspect level sentiment analysis is a fine grained task in sentiment analysis which identifies the product features from an opinionated piece of text and maps the sentiment towards each of them. Supervised ML algorithms have reported comparatively higher performance on aspect level sentiment analysis but at the cost of substantial qualitative labelled data. Data labelling for such fine grained tasks also demand domain knowledge and expertise. Hence a mechanism to extract a minimal informative subset which is almost representative of the entire data would be a breakthrough in bringing down the annotation costs to a large extent. The proposed methodology puts forward an active learning based sampling strategy for aspect term extraction, a subtask in aspect level sentiment analysis which identifies the product features. The sampling strategy is automated by reinforcement learning which extracts an optimal sample from the entire unlabelled training data and hence optimizes data annotation by reducing the time and effort linked to the labelling process. This work is of high importance in a data driven era where companies invest a lot in collecting and annotating huge volumes of data. The model has been experimented across the laptop and restaurant domains of SemEval (2014–2016) datasets. The experiments proved that a considerable reduction of the training data size is achieved across different datasets. The model trained on the data extracted by the proposed reinforced active learning model beats random sampling by 9 to 17 points when evaluated on the F-measure of the extracted aspect terms and is almost on par with the model trained on the entire training data by utilising hardly 9 to 13% of the entire training data across the datasets experimented.

论文关键词:Active learning,Reinforcement learning,Sequential text labelling,Aspect term extraction,Deep learning,Optimal sampling,Data Annotation

论文评审过程:Received 23 February 2022, Revised 11 May 2022, Accepted 17 July 2022, Available online 21 July 2022, Version of Record 31 July 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118228