WHIRL: A word-based information representation language

作者:

摘要

We describe WHIRL, an “information representation language” that synergistically combines properties of logic-based and text-based representation systems. WHIRL is a subset of Datalog that has been extended by introducing an atomic type for textual entities, an atomic operation for computing textual similarity, and a “soft” semantics; that is, inferences in WHIRL are associated with numeric scores, and presented to the user in decreasing order by score. This paper briefly describes WHIRL, and then surveys a number of applications. We show that WHIRL strictly generalizes both ranked retrieval of documents, and logical deduction; that nontrivial queries about large databases can be answered efficiently; that WHIRL can be used to accurately integrate data from heterogeneous information sources, such as those found on the Web; that WHIRL can be used effectively for inductive classification of text; and finally, that WHIRL can be used to semi-automatically generate extraction programs for structured documents.

论文关键词:Knowledge representation,Information retrieval,Textual similarity,Heterogeneous databases,Information integration,Text categorization,Information extraction

论文评审过程:Received 31 October 1998, Revised 28 September 1999, Available online 9 July 2001.

论文官网地址:https://doi.org/10.1016/S0004-3702(99)00102-2