CMG2Vec: A composite meta-graph based heterogeneous information network embedding approach

作者:

Highlights:

摘要

Heterogeneous information network embedding has been intensively studied in the past years. However, existing methods require users to manually assign meta-paths or meta-graphs in advance. Meanwhile, most of previous approaches only consider a single type of meta-path or meta-graph which is usually sparse and biased, and thus the node representations learned may be incomprehensive and inaccurate. To tackle these limitations, we proposed an extensible semantic description structure, called Composite Meta-Graph(CMG). By virtue of such a structure, users do not need to worry about selection of an appropriate meta-path or meta-graph. Rich semantic relations and rich structural contexts between nodes of different types and of different distances can be elaborated accurately according to CMG. Moreover, a CMG based heterogeneous information embedding framework, namely CMG2Vec, is also proposed. By expanding the auto-encoder into a heterogeneous network scenario, CMG2Vec can embed proximities between nodes of multiple orders learned from CMG into latent representations after a series of encoding–decoding non-linear mapping. During the fusing process, an attention mechanism is adopted to automatically learn weights of these latent vectors, which enables each final node representation to focus on proximity of the most informative order. Experimental results on three large-scale datasets demonstrate that our method outperforms existing state-of-the-art homogeneous and heterogeneous network embedding approaches in three network mining tasks in terms of node classification, node clustering, and node similarity search.

论文关键词:Heterogeneous information network,Network embedding,Composite meta-graph,Attention mechanism,Auto-encoder

论文评审过程:Received 2 February 2020, Revised 7 May 2020, Accepted 5 December 2020, Available online 14 January 2021, Version of Record 1 February 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106661