Mining Temporal Evolution of Knowledge Graphs and Genealogical Features for Literature-based Discovery Prediction

作者:

Highlights:

• Preferential attachments are prevalent in knowledge network growth. This article analyzes temporal co-occurrences of author selected keywords in scientific literature to support emerging Literature Based Discovery (LBD) by viewing the process as supervised learning problem.

• By mining temporal evolution of keyword co-occurrences networks (KCN) and prominence of keywords over time with regards to their edge formation, network neighborhood, this article defined genealogical communities of keywords.

• To predict the future co-evolution of author selected keywords in scientific literature, the feature construction process analyzed both bipartite network (keywords-authors and keywords- articles) and normalized unipartite network (keywords-keywords) including relative importance of the citation counts accrued by the author selected keywords over time.

• The prediction performances were compared against features extracted from homogeneous and heterogeneous networks (heterogeneous bibliographic information network) to demonstrate the competency of the constructed features in predicting future scientific hypotheses in two research domains.

摘要

•Preferential attachments are prevalent in knowledge network growth. This article analyzes temporal co-occurrences of author selected keywords in scientific literature to support emerging Literature Based Discovery (LBD) by viewing the process as supervised learning problem.•By mining temporal evolution of keyword co-occurrences networks (KCN) and prominence of keywords over time with regards to their edge formation, network neighborhood, this article defined genealogical communities of keywords.•To predict the future co-evolution of author selected keywords in scientific literature, the feature construction process analyzed both bipartite network (keywords-authors and keywords- articles) and normalized unipartite network (keywords-keywords) including relative importance of the citation counts accrued by the author selected keywords over time.•The prediction performances were compared against features extracted from homogeneous and heterogeneous networks (heterogeneous bibliographic information network) to demonstrate the competency of the constructed features in predicting future scientific hypotheses in two research domains.

论文关键词:Literature-based Knowledge Discovery,Dynamic Supervised Link Prediction,Keyword Co-occurrence Network (KCN),Genealogical Community,Weighted Temporal Citation

论文评审过程:Received 14 November 2019, Revised 16 May 2020, Accepted 22 May 2020, Available online 30 June 2020, Version of Record 30 June 2020.

论文官网地址:https://doi.org/10.1016/j.joi.2020.101057