Cross-domain aspect extraction for sentiment analysis: A transductive learning approach

Highlights：

• We show that transductive learning is promising for cross-domain aspect extraction.

• A heterogeneous network model to fuse knowledge for cross-domain transfer learning

• Aspect label propagation using linguistic features as a bridge between domains

• Experimental results validate the effectiveness and efficiency of our approach.

摘要

Aspect-Based Sentiment Analysis (ABSA) is a promising approach to analyze consumer reviews at a high level of detail, where the opinion about each feature of the product or service is considered. ABSA usually explores supervised inductive learning algorithms, which requires intense human effort for the labeling process. In this paper, we investigate Cross-Domain Transfer Learning approaches, in which aspects already labeled in some domains can be used to support the aspect extraction of another domain where there are no labeled aspects. Existing cross-domain transfer learning approaches learn classifiers from labeled aspects in the source domain and then apply these classifiers in the target domain, i.e., two separate stages that may cause inconsistency due to different feature spaces. To overcome this drawback, we present an innovative approach called CD-ALPHN (Cross-Domain Aspect Label Propagation through Heterogeneous Networks). First, we propose a heterogeneous network-based representation that combines different features (labeled aspects, unlabeled aspects, and linguistic features) from source and target domain as nodes in a single network. Second, we propose a label propagation algorithm for aspect extraction from heterogeneous networks, where the linguistic features are used as a bridge for this propagation. Our algorithm is based on a transductive learning process, where we explore both labeled and unlabeled aspects during the label propagation. Experimental results show that the CD-ALPHN outperforms the state-of-the-art methods in scenarios where there is a high-level of inconsistency between the source and target domains — the most common scenario in real-world applications.