Rim: A reusable iterative model for big data
作者:
Highlights:
•
摘要
In the big data environment, iterative computing is widely used in many applications such as data mining, machine learning, graph analysis and so on. Many iterative computing models are proposed to support the execution of iterative algorithms on big data efficiently. However, it is inefficient if the entire dataset has to be re-iterated when it is partly changed, for example, some data is included or excluded. This paper presents Rim, a Reusable Iterative computing Model which calculates the new iterative results with the updated dataset and the original iterative results, avoiding re-iteration on entire dataset. We propose the application conditions of Rim, and mathematically prove the accuracy and performance advantages of Rim, and describe Rim's application on three typical iterative algorithms, which are PageRank, K-means and Descendant-query. Finally, we implement Rim in Spark, and evaluate its performance on different test cases and iterative algorithms. In term of PageRank, K-Means and Descendant-query, experiments show our approach is on average 1.34×, 2.51×, 3.17× faster than re-iteration on massive dataset, respectively.
论文关键词:Big data,Iterative computing,Iterative model,Reusable
论文评审过程:Received 30 September 2017, Revised 22 March 2018, Accepted 24 April 2018, Available online 25 April 2018, Version of Record 11 May 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.04.032