Approximating sliding windows by cyclic tree-like histograms for efficient range queries
作者:
Highlights:
•
摘要
The issue of providing fast approximate answers to range queries on sliding windows with a small consumption of storage space is one of the main challenges in the context of data streams. On the one hand, the importance of this class of queries is widely accepted. They are indeed useful to compute aggregate information over the data stream, allowing us to extract from it more abstract knowledge than point queries. On the other hand, the usage of techniques like synopses based on histograms, sketches, sampling, and so on, makes effective those approaches which require multiple scans on data, which otherwise would be prohibitive from the computational point of view. Among the above techniques, histogram-based approaches are considered one of the most advantageous solutions, at least in case of range queries. It is a matter of fact that histograms show a very good capability of summarizing data preserving quick and accurate answers to range queries. In this paper, we propose a novel histogram-based technique to reduce sliding windows supporting approximate arbitrary range-sum queries. Our histogram, relying on a tree-based structure, is suitable to directly support hierarchical queries and, thus, drill-down and roll-up operations. In addition, the structure well supports sliding window shifting and quick query answering, since it operates in logarithmic time in the sliding window size. A bit-saving approach to encoding tree nodes allows us to compress the sliding window with a little price in terms of accuracy. The contribution of this work is thus not only the proposal of a new specific technique to tackle an important problem but also a deep analysis of the advantages given by the hierarchical approach combined with the bit-saving strategy. A careful experimental analysis validates the method showing its superiority w.r.t. the state of the art.
论文关键词:Data stream summarization,Histograms,Sliding windows,Range queries
论文评审过程:Received 18 November 2008, Revised 31 May 2010, Accepted 31 May 2010, Available online 8 June 2010.
论文官网地址:https://doi.org/10.1016/j.datak.2010.05.002