Approximate Query Answering Using Data Warehouse Striping
作者:Jorge R. Bernardino, Pedro S. Furtado, Henrique C. Madeira
摘要
This paper presents and evaluates a simple but very effective method to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.
论文关键词:data warehousing, distributed query optimization, data partitioning, performance optimization, approximate query answering
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1016551309288