Sentence representation with manifold learning for biomedical texts

作者:

Highlights:

摘要

Sentence representation approaches based on deep learning have become a major part of natural language processing, and pretrained sentences have wide applications in biomedical texts. However, the geometric basis of sentence representations has not yet been carefully studied in biomedical texts. In this paper, we focus on exploiting the geometric structure of sentences to improve the biomedical text presentation effect. To mine the geometric structure information from sentence representations, we introduce manifold learning, which brings the similarity of sentences in Euclidean space closer to the sentence semantics, into biomedical sentence representations. First, we use the pretrained sentence representation method to obtain a representation of a biomedical text sentence and then use manifold learning to construct the adjacency graph structure of the sentence representation to characterize the local geometric structure information of the sentence representations, thus revealing the essential laws among the sentences. Through the manifold method, we can describe the potential relations among sentences, thus improving the effect based on downstream biomedical text tasks. Our sentence representation method was evaluated on biomedical text tasks. The experimental results show that our model achieved better results than several normal sentence representation methods.

论文关键词:Sentence representation,Biomedical text embedding,Manifold learning

论文评审过程:Received 25 November 2020, Revised 18 January 2021, Accepted 15 February 2021, Available online 17 February 2021, Version of Record 24 February 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106869