Tuple MapReduce and Pangool: an associated implementation
作者:Pedro Ferrera, Ivan De Prado, Eric Palacios, Jose Luis Fernandez-Marquez, Giovanna Di Marzo Serugendo
摘要
This paper presents Tuple MapReduce, a new foundational model extending MapReduce with the notion of tuples. Tuple MapReduce allows to bridge the gap between the low-level constructs provided by MapReduce and higher-level needs required by programmers, such as compound records, sorting, or joins. This paper shows as well Pangool, an open-source framework implementing Tuple MapReduce. Pangool eases the design and implementation of applications based on MapReduce and increases their flexibility, still maintaining Hadoop’s performance. Additionally, this paper shows: pseudo-codes for relational joins, rollup, and the PageRank algorithm; a Pangool’s code example; benchmark results comparing Pangool with existing approaches; reports from users of Pangool in industry; and the description of a distributed database exploiting Pangool. These results show that Tuple MapReduce can be used as a direct, better-suited replacement of the MapReduce model in current implementations without the need of modifying key system fundamentals.
论文关键词:MapReduce, Hadoop, Big Data, Distributed systems, Scalability
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-013-0705-z