A log-linear approach to mining significant graph-relational patterns

作者:

Highlights:

摘要

Objects in many application domains can be characterized as link-based data, having both network (graph) information as well as structured information describing the nodes. Discovery of frequent patterns in this setting is vulnerable to problems that cannot occur in pattern mining on conventional data without network information. While patterns may appear to reflect novel characteristics of a combination of graph and node information, they may be expected based on patterns that could be found using conventional data mining techniques. We introduce a significance measure that identifies patterns that are unexpected based on node attributes in isolation and neighbor correlations. A statistical log-linear model is extended for this purpose and the structural symmetry of the link-based data is accounted for. Eliminating insignificant results reduces the output quantity by orders of magnitude. Efficiency is achieved by designing the pattern mining algorithm as a hybrid of conventional pattern mining and graph data mining. We demonstrate effectiveness and efficiency of the approach for yeast and for movie data.

论文关键词:Knowledge discovery,Significant patterns,Log-linear models,Link-based data,Graphical models

论文评审过程:Received 20 November 2008, Revised 22 February 2011, Accepted 23 February 2011, Available online 9 March 2011.

论文官网地址:https://doi.org/10.1016/j.datak.2011.02.004