DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

作者:Maciej Rybiński, José Francisco Aldana Montes

摘要

Being able to correctly model semantic relatedness between texts, and consequently the concepts represented by these texts, has become an important part of many intelligent information retrieval and knowledge processing systems. The need for such systems is especially evident within the biomedical domain, where the sheer amount of scientific publishing contributes to an information overflow. In this paper we present a novel method to approximate semantic relatedness in domain-focused settings. The approach is an extension to a well-known ESA (Explicit Semantic Analysis) method. Our extension successfully leverages the semantics of a domain-specific document corpus. We present the evaluation of the proposed method on a set of reference datasets, that are a de facto reference standard for the task of approximating biomedical semantic relatedness. The proposed method is evaluated in comparison with other state-of-the-art methods, as well as the baselines established with the original ESA method. The results of the experiments suggest that the proposed method combines the semantics of a general and domain-specific corpora to provide significant improvements over the original method.

论文关键词:Semantic relatedness, Biomedicine, Distributional linguistics, Semantic similarity, ESA, Text analytics

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-017-0442-y