Completeness of integrated information sources

作者:

Highlights:

摘要

For many information domains there are numerous World Wide Web data sources. The sources vary both in their extension and their intension: They represent different real-world entities with possible overlap and provide different attributes of these entities. Mediator-based information systems allow integrated access to such sources by providing a common schema against which the user can pose queries. Given a query, the mediator must determine which participating sources to access and how to integrate the incoming results.This article describes how to support mediators in their source selection and query planning process. We propose three new merge operators, which formalize the integration of multiple source responses. A completeness model describes the usefulness of a source to answer a query. The completeness measure incorporates both extensional value (called coverage) and intensional value (called density) of a source. We show how to determine the completeness of single sources and of combinations of sources under the new merge operators. Finally, we show how to use the measure for source selection and query planning.

论文关键词:Query planning,Coverage,Density,Information integration,Result size,Overlap

论文评审过程:Received 30 May 2003, Revised 1 November 2003, Accepted 15 December 2003, Available online 5 February 2004.

论文官网地址:https://doi.org/10.1016/j.is.2003.12.005