Integrating the maintenance and synchronization of data warehouses using a cooperative framework

作者:

Highlights:

摘要

Data warehouses (DW) are built by gathering information from several information sources and integrating it into one repository customized to users’ needs. Recently proposed view maintenance algorithms tackle the problem of (concurrent) data updates happening at different autonomous ISs, whereas the EVE system addresses the maintenance of a data warehouse after schema changes of ISs. The concurrency of schema changes and data updates performed by different ISs remains an unexplored problem however. This paper provides a solution to this problem that guarantees the concurrent view definition evolution and view extent maintenance of a DW defined over distributed ISs. To solve that problem, we introduce a framework called SDCC (Schema change and Data update Concurrency Control) system. SDCC integrates existing algorithms designed to address view maintenance subproblems, such as view extent maintenance after IS data updates, view definition evolution after IS schema changes, and view extent adaptation after view definition changes, into one system by providing protocols that enable them to correctly co-exist and collaborate. SDCC tracks any potential faulty updates of the DW caused by conflicting concurrent IS changes using a global message labeling scheme. An algorithm that is able to compensate for such conflicting updates by a local correction strategy, called local compensation (LC), is incorporated into SDCC. The correctness of LC is proven. The overhead of the SDCC solution beyond the costs of the known view maintenance algorithms it incorporates is shown to be negligible. Lastly, a refined hierarchy of consistency levels for the state of a data warehouse with respect to its underlying dynamic environment is presented, now incorporating both dynamicity of the data and the schema. The SDCC solution is shown to reach a semi-concurrency level of consistency, not reached by any prior DW system.

论文关键词:Data warehousing,View maintenance,Concurrency of updates,Data updates and schema changes,View consistency,Distributed information sources

论文评审过程:Received 19 July 1999, Revised 20 December 2000, Accepted 28 August 2001, Available online 28 December 2001.

论文官网地址:https://doi.org/10.1016/S0306-4379(01)00049-7