Network embedding by fusing multimodal contents and links
作者:
Highlights:
•
摘要
Embedding the network into a low-dimensional space has attracted extensive research interest as well as boomed a lot of applications, such as node classification and link prediction. Most existing methods learn the network embedding simply from the network structure. However, the social media data, such as social images, usually contain both multimodal contents (e.g., visual content and text description) and social links among the images. To address this problem, we propose a novel model Attention-based Multi-view Variational Auto-Encoder (AMVAE) to fuse both the links and the multimodal contents for more effectively and efficiently network embedding. Specifically, Bi-LSTM (bidirectional long short-term memory) with attention model is proposed to capture the fine-granularity correlation between different data modalities, such as some words are reflected by specific visual regions. A joint representation of the multimodal contents is accordingly learned. Then, the network structure information and the learned representation for the multimodal contents are considered as two views. To fuse the two views, a multi-view correlation learning based Variational Auto-Encoder (VAE) is proposed to learn the representation of each node. By jointly optimizing the two components into a holistic learning framework, the embedding of network structure and multimodal contents are integrated and mutually reinforced. Experiments on three real-world datasets demonstrate the superiority of the proposed model in two applications, i.e., multi-label classification and link prediction.
论文关键词:Multimodal learning,Multi-view learning,Network embedding,Variational autoencoder,Attention model
论文评审过程:Received 20 July 2018, Revised 28 January 2019, Accepted 2 February 2019, Available online 14 February 2019, Version of Record 12 March 2019.
论文官网地址:https://doi.org/10.1016/j.knosys.2019.02.003