DEByE – Data Extraction By Example
作者:
Highlights:
•
摘要
In this paper we present DEByE (Data Extraction By Example), an approach to extracting data from Web sources, based on a small set of examples specified by the user. The novelty is in the fact that the user specifies examples according to a structure of his liking and that this structure is described at example specification time. For the specification of the examples, the user interacts with a tool we developed which adopts nested tables as its visual paradigm. Nested tables are simple, intuitive, and allow shielding the user from technical details (such as HTML tags, formatting operators, and learning automata) related to the extraction problem. The examples provided by the user are then used to generate patterns which allow extracting data from new documents. For the extraction, DEByE adopts a new bottom-up procedure we proposed which is very effective with various Web sources, as demonstrated by our experiments.
论文关键词:Data extraction,Wrapper generation,Web data management
论文评审过程:Received 3 May 2001, Revised 21 May 2001, Accepted 16 July 2001, Available online 29 November 2001.
论文官网地址:https://doi.org/10.1016/S0169-023X(01)00047-7