Handling data skew in join algorithms using MapReduce
作者:
Highlights:
• We introduce a skew handling algorithm, called multi-dimensional range partitioning.
• The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.
• The proposed algorithm is scalable regardless of the size of input data.
摘要
•We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data.
论文关键词:MapReduce,Join algorithm,Skew handling,Multi-dimensional range partitioning
论文评审过程:Received 7 February 2015, Revised 26 October 2015, Accepted 21 December 2015, Available online 6 January 2016, Version of Record 23 January 2016.
论文官网地址:https://doi.org/10.1016/j.eswa.2015.12.024