Towards the derivation of verbal content relations from patent claims using deep syntactic structures

作者:

Highlights:

摘要

Research on the extraction of content relations from text corpora is a high-priority topic in natural language processing. This is not surprising since content relations form the backbone of any ontology, and ontologies are increasingly made use of in knowledge-based applications. However, so far most of the works focus on the detection of a restricted number of prominent verbal relations, including in particular is-a, has-part and cause. Our application, which aims to provide comprehensive, easy-to-understand content representations of complex functional objects described in patent claims, faces the need to derive a large number of content relations that cannot be limited a priori. To cope with this problem, we take advantage of the fact that deep syntactic dependency structures of sentences capture all relevant content relations—although without any abstraction. We implement thus a three-step strategy. First, we parse the claims to retrieve the deep syntactic dependency structures from which we then derive the content relations. Second, we generalize the obtained relations by clustering them according to semantic criteria, with the goal to unite all sufficiently similar relations. Finally, we identify a suitable name for each generalized relation. To keep the scope of the article within reasonable limits and to allow for a comparison with state-of-the-art techniques, we focus on verbal relations.

论文关键词:Deep dependency parsing,Dependency relation,Relation clustering,Cluster labeling,Specialized discourse

论文评审过程:Received 4 August 2010, Revised 25 May 2011, Accepted 26 May 2011, Available online 12 June 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.05.014