Generating summaries from event data

作者:

Highlights:

摘要

Summarization entails analysis of source material, selection of key information, condensation of this, and generation of a compact summary form. While there have been many investigations into the automatic summarization of text, relatively little attention has been given to the summarization of information from structured information sources such as data or knowledge bases, despite this being a desirable capability for a number of application areas including report generation from databases (e.g. weather, financial, medical) and simulations (e.g. military, manufacturing, economic). After a brief introduction indicating the main elements of summarization and referring to some illustrative approaches to it, this article considers specific issues in the generation of text summaries of event data. It describes a system, SumGen, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures. The article describes how SumGen then aggregates similar information and plans a summary presentation tailored to a stereotypical user. Finally, the article evaluates SumGen performance, and also that of a much more limited second summariser, by assessesing information extraction by 22 human subjects from both source and summary texts. This evaluation shows that the use of SumGen reduces average sentence length by approx. 15%, document length by 70%, and time to perform information extraction by 58%.

论文关键词:Automated summarization,Natural language generation,Importance,Condensation,Aggregation,Tailored summarization,Automated abstracting

论文评审过程:Available online 21 February 2000.

论文官网地址:https://doi.org/10.1016/0306-4573(95)00025-C