Extracting Logical Schema from the Web

作者:Vincenza Carchiolo, Alessandro Longheu, Michele Malgeri

摘要

One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a web-site, first defining a model for a single page, where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.

论文关键词:semistructured data, world wide web, schema extraction

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1023206322783