Cross-lingual sentiment classification: Similarity discovery plus training data adjustment

作者:

Highlights:

摘要

The performance of cross-lingual sentiment classification is sharply limited by the language gap, which means that each language has its own ways to express sentiments. Many methods have been designed to transmit sentiment information across languages by making use of machine translation, parallel corpora, auxiliary unlabeled samples and other resources. In this paper, a new approach is proposed based on the selection of training data, where labeled samples highly similar to the target language are put into the training set. The refined training samples are used to build up an effective cross-lingual sentiment classifier focusing on the target language. The proposed approach contains two major strategies: the aligned-translation topic model and the semi-supervised training data adjustment. The aligned-translation topic model provides a cross-language representation space in which the semi-supervised training data adjustment procedure attempts to select effective training samples to eliminate the negative influence of the semantic distribution differences between the original and target languages. The experiments show that the proposed approach is feasible for cross-language sentiment classification tasks and provides insight into the semantic relationship between two different languages.

论文关键词:Topic model,Cross-lingual sentiment classification,Semi-supervised learning

论文评审过程:Received 4 February 2016, Revised 13 May 2016, Accepted 5 June 2016, Available online 8 June 2016, Version of Record 9 July 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.06.004