Box queries over multi-dimensional streams
作者:
Highlights:
•
摘要
Answering statistical queries about streams of online arriving data is becoming increasingly important. Often, such data includes multiple-attributes, so data elements can be viewed as points in a multi-dimensional universe. This paper extends existing works on streaming algorithms by studying the ability to perform box queries on online multi-dimensional data streams. We develop three algorithms C-DARQ, DARQ and MARQ that support such capabilities for a large number of statistical functions including (but not limited to) counting, frequency estimation, heavy-hitters etc. We also apply our algorithms in distributed settings, in which measurements are recorded independently by multiple sites (e.g., multiple routers), and the goal is to obtain a global network analysis. The protocols are analyzed and evaluated over synthetic dataset, Chicago dataset, and a Facebook dataset from Kaggle in multiple dimensions (up to 10). Our algorithms asymptotically improve the space bounds as well as update and query performance of existing works. Unlike known approaches, our algorithms can also be used to solve a larger class of problems beyond counting. We further discuss extending our work to the sliding window model and when the dimensions’ bounds are a-priori unknown.
论文关键词:
论文评审过程:Received 23 October 2021, Revised 19 May 2022, Accepted 26 May 2022, Available online 3 June 2022, Version of Record 30 June 2022.
论文官网地址:https://doi.org/10.1016/j.is.2022.102086