Exploiting semantic similarity for named entity disambiguation in knowledge graphs

作者:

Highlights:

摘要

With the increasing popularity of large scale Knowledge Graph (KG)s, many applications such as semantic analysis, search and question answering need to link entity mentions in texts to entities in KGs. Because of the polysemy problem in natural language, entity disambiguation is thus a key problem in current research. Existing disambiguation methods have considered entity prominence, context similarity and entity-entity relatedness to discriminate ambiguous entities, which are mainly working on document or paragraph level texts containing rich contextual information, and based on lexical matching for computing context similarity. When meeting short texts containing limited contextual information, such as web queries, questions and tweets, those conventional disambiguation methods are not good at handling single entity mention and measuring context similarity. In order to enhance the performance of disambiguation methods based on context similarity with such short texts, we propose SCSNED method for disambiguation based on semantic similarity between contextual words and informative words of entities in KGs. Specially, we exploit the effectiveness of both knowledge-based and corpus-based semantic similarity methods for entity disambiguation with SCSNED. Moreover, we propose a Category2Vec embedding model based on joint learning of word and category embedding, in order to compute word-category similarity for entity disambiguation. We show the effectiveness of these proposed methods with illustrative examples, and evaluate their effectiveness in a comparative experiment for entity disambiguation in real world web queries, questions and tweets. The experimental results have identified the effectiveness of different semantic similarity methods, and demonstrated the improvement of semantic similarity methods in SCSNED and Category2Vec over the conventional context similarity baseline. We further compare the proposed approaches with the state of the art entity disambiguation systems and show the performances of the proposed approaches are among the best performing systems. In addition, one important feature of the proposed approaches using semantic similarity, is the potential application on any existing KGs since they mainly use common features of entity descriptions and categories. Another contribution of the paper is an updated survey on background of entity disambiguation in KGs and semantic similarity methods.

论文关键词:Entity linking,Named entity disambiguation,Context similarity,Semantic similarity,Word embedding,Knowledge graph

论文评审过程:Received 14 June 2017, Revised 9 January 2018, Accepted 6 February 2018, Available online 9 February 2018, Version of Record 15 February 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.02.011