Introducing structure management in automatic reference resolution: An XML-based approach

作者:

Highlights:

摘要

References to parts of structured documents use their structure to locate the piece of document which is the reference target. On the other hand, XML has become an increasingly important language for structured documents. One of its most important related languages is XPath, the language that permits fragments of XML documents to be selected. In this article we present a methodology, and an application case, to automatically extract and solve references to fragments of structured documents. This approach combines structure manipulation and information extraction, to enhance reference extraction tools by improving the precision of the references extracted. We take advantage of XML markup to locate the position within the structure in which the references are found. The use of XPath, one of the most important XML related languages, for reference resolution is original: the resolution tool automatically builds XPath expressions. This proposal is inspired (and implemented) from our work with legislative documents.

论文关键词:Information extraction,XML,Structured documents,Reference extraction,Reference resolution,Legislative documents

论文评审过程:Received 27 June 2006, Revised 1 December 2006, Accepted 5 December 2006, Available online 15 February 2007.

论文官网地址:https://doi.org/10.1016/j.ipm.2006.12.004