Information extraction with automatic knowledge expansion

作者:

Highlights:

摘要

POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the “continuing education” domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on “job offering” domain.

论文关键词:Information extraction,Question answering,User-oriented learning,Lexico-semantic pattern,Machine learning

论文评审过程:Received 28 February 2003, Accepted 15 July 2003, Available online 6 September 2003.

论文官网地址:https://doi.org/10.1016/S0306-4573(03)00066-9