Emerging Cubes: Borders, size estimations and lossless reductions
作者:
Highlights:
•
摘要
Discovering trend reversals between two data cubes provides users with a novel and interesting knowledge when the real world context fluctuates: What is new? Which trends appear or emerge? Which tendencies are immersing or disappear? With the concept of Emerging Cube, we capture such trend reversals by enforcing an emergence constraint. We resume the classical borders for the Emerging Cube and introduce a new one which optimizes both storage space and computation time, provides a simple characterization of the size of Emerging Cubes, as well as classification and cube navigation tools. We soundly state the connection between the classical and proposed borders by using cube transversals. Knowing the size of Emerging Cubes without computing them is of great interest in particular for adjusting at best the underlying emergence constraint. We address this issue by studying an upper bound and characterizing the exact size of Emerging Cubes. We propose two strategies for quickly estimate their size: one based on analytical estimation, without database access, and one based on probabilistic counting using the proposed borders as the input of the near-optimal algorithm HyperLogLog. Due to the efficiency of the estimation algorithm various iterations can be performed to calibrate at best the emergence constraint. Moreover, we propose reduced and lossless representations of the Emerging Cube by using the concept of cube closure. Finally, we perform experiments for different data distributions in order to measure on one hand the size of the introduced condensed and concise representations and on the other hand the performance (accuracy and computation time) of the proposed estimation method.
论文关键词:Olap mining,Data warehouse,Data cube,Trend analysis,Cube size estimation,Closure
论文评审过程:Received 13 October 2008, Revised 18 February 2009, Accepted 2 March 2009, Available online 12 March 2009.
论文官网地址:https://doi.org/10.1016/j.is.2009.03.001