An empirical study of automated dictionary construction for information extraction in three domains

作者:

摘要

A primary goal of natural language processing researchers is to develop a knowledge-based natural language processing (NLP) system that is portable across domains. However, most knowledge-based NLP systems rely on a domain-specific dictionary of concepts, which represents a substantial knowledge-engineering bottleneck. We have developed a system called AutoSlog that addresses the knowledge-engineering bottleneck for a task called information extraction. AutoSlog automatically creates domain-specific dictionaries for information extraction, given an appropriate training corpus. We have used AutoSlog to create a dictionary of extraction patterns for terrorism, which achieved 98% of the performance of a hand-crafted dictionary that required approximately 1500 person-hours to build. In this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog.

论文关键词:

论文评审过程:Available online 20 February 1999.

论文官网地址:https://doi.org/10.1016/0004-3702(95)00123-9