SciKGraph: A knowledge graph approach to structure a scientific field

作者:

Highlights:

摘要

Understanding the structure of a scientific domain and extracting specific information from it is laborious. The high amount of manual effort required to this end indicates that the way knowledge has been structured and visualized until the present day should be improved in software tools. Nowadays, scientific domains are organized based on citation networks or bag-of-words techniques, disregarding the intrinsic semantics of concepts presented in literature documents. We propose a novel approach to structure scientific fields, which uses semantic analysis from natural language texts to construct knowledge graphs. Then, our approach clusters knowledge graphs in their main topics and automatically extracts information such as the most relevant concepts in topics and overlapping concepts between topics. We evaluate the proposed model in two datasets from distinct areas. The results achieve up to 84% of accuracy in the task of document classification without using annotated data to segment topics from a set of input documents. Our solution identifies coherent keyphrases and key concepts considering the dataset used. The SciKGraph framework contributes by structuring knowledge that might aid researchers in the study of their areas, reducing the effort and amount of time devoted to groundwork.

论文关键词:Knowledge graphs,Knowledge representation,Overlap clustering,Semantic annotation,Document classification

论文评审过程:Received 25 February 2020, Revised 24 September 2020, Accepted 2 November 2020, Available online 10 December 2020, Version of Record 10 December 2020.

论文官网地址:https://doi.org/10.1016/j.joi.2020.101109