Assessing the influence of personal preferences on the choice of vocabulary for natural language generation

作者:

Highlights:

摘要

Referring expression generation is the part of natural language generation that decides how to refer to the entities appearing in an automatically generated text. Lexicalization is the part of this process which involves the choice of appropriate vocabulary or expressions to transform the conceptual content of a referring expression into the corresponding text in natural language. This problem presents an important challenge when we have enough knowledge to allow more than one alternative. In those cases, we need some heuristics to decide which alternatives are more appropriate in a given situation. Whereas most work on natural language generation has focused on a generic way of generating language, in this paper we explore personal preferences as a type of heuristic that has not been properly addressed. We empirically analyze the TUNA corpus, a corpus of referring expression lexicalizations, to investigate the influence of language preferences in how people lexicalize new referring expressions in different situations. We then present two corpus-based approaches to solve the problem of referring expression lexicalization, one that takes preferences into account and one that does not. The results show a decrease of 50% in the similarity error against the reference corpus when personal preferences are used to generate the final referring expression.

论文关键词:Natural language generation,Referring expression generation,Lexicalization,Personalization,Corpus approach

论文评审过程:Received 14 April 2011, Revised 21 January 2013, Accepted 24 January 2013, Available online 27 February 2013.

论文官网地址:https://doi.org/10.1016/j.ipm.2013.01.006