Checkpoint evolution for volatile correlation computing
作者:Wenjun Zhou, Hui Xiong
摘要
Given a set of data objects, the problem of correlation computing is concerned with efficient identification of strongly-related ones. Existing studies have been mainly focused on static data. However, as observed in many real-world scenarios, input data are often dynamic and analytical results have to be continually updated. Therefore, there is the critical need to develop a dynamic solution for volatile correlation computing. To this end, we develop a checkpoint scheme, which can help us capture dynamic correlation values by establishing an evolving computation buffer. In this paper, we first provide a theoretical analysis of the properties of the volatile correlation, and derive a tight upper bound. Such tight and evolving upper bound is used to identify a small list of candidate pairs, which are maintained as new transactions are added into the database. Once the total number of new transactions goes beyond the buffer size, the upper bound is re-computed according to the next checkpoint, and a new list of candidate pairs is identified. Based on such a scheme, a new algorithm named CHECK-POINT+ has been designed. Experimental results on real-world data sets show that CHECK-POINT+ can significantly reduce the computation cost in dynamic data environments, and has the advantage of compacting the use of memory space.
论文关键词:Pearson’s correlation coefficient, Correlation coefficient, Volatile correlation computing, Checkpoint
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10994-010-5204-9