Approximated trial and error analysis in scientific databases

作者:

Highlights:

摘要

Databases are nowadays one more building block in complex multi-tier architectures. In general, however, they are still designed and optimized with little regard for the applications that will run on top of them. This problem is particularly acute in scientific applications where the data is never used or viewed as it is but always processed either for visualization or analysis purposes. In such scenarios, the data is usually processed at the client and, hence, conventional server side optimizations are of limited help. In this paper we present a variety of techniques and a novel client/server architecture designed to optimize the client side processing of scientific data. The main building block in our approach is to store frequently accessed data as relatively small, wavelet-encoded segments. These segments can be processed at different resolutions, thereby enabling efficient processing of very large data volumes. Experimental results demonstrate that our approach significantly reduces overhead (I/O, transfer across network, decoding and analysis). Furthermore, it does not require changes to the analysis routines and provides all possible resolution ranges. In the paper we describe these ideas and how they have been implemented in HEDC (RHESSI Experimental Data Center), a multi-TByte data hub for RHESSI, the Reuven Ramaty High Energy Solar Spectroscopic Imager satellite of NASA.

论文关键词:Scientific databases,Data processing,Approximation,Approximated views,Client-side caching

论文评审过程:Received 1 May 2002, Revised 13 August 2002, Accepted 14 August 2002, Available online 5 December 2002.

论文官网地址:https://doi.org/10.1016/S0306-4379(02)00052-2