An experimental study of graph-based semi-supervised classification with additional node information

作者：Bertrand Lebichot, Marco Saerens

摘要

The volume of data generated by internet and social networks is increasing every day, and there is a clear need for efficient ways of extracting useful information from them. As this information can take different forms, it is important to use all the available data representations for prediction; this is often referred to multi-view learning. In this paper, we consider semi-supervised classification using both regular, plain, tabular, data and structural information coming from a network structure (feature-rich networks). Sixteen techniques are compared and can be divided in three families: the first one uses only the plain features to fit a classification model, the second uses only the network structure, and the last combines both information sources. These three settings are investigated on 10 real-world datasets. Furthermore, network embedding and well-known autocorrelation indicators from spatial statistics are also studied. Possible applications are automatic classification of web pages or other linked documents, of nodes in a social network, or of proteins in a biological complex system, to name a few. Based on our findings, we draw some general conclusions and advice to tackle this particular classification task: it is clearly observed that some dataset labelings can be better explained by their graph structure or by their features set.

论文关键词：Network data analysis, Semi-supervised classification, Link analysis, Graph mining, Multi-view learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10115-020-01500-0