Annotation and verification of sense pools in OntoNotes

作者：

Highlights：

•

摘要

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.

论文关键词：Lexical semantics,Corpus annotation,Lexical substitution,Ontology linking

论文评审过程：Received 30 October 2008, Revised 3 November 2009, Accepted 17 November 2009, Available online 29 December 2009.

论文官网地址：https://doi.org/10.1016/j.ipm.2009.11.002