A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments
作者:
Highlights:
• We design the Extracting-Transforming-Loading (ETL) process in Big Data environments.
• Parallel/distributed ETL processes are designed according to the MapReduce paradigm.
• We define the ETL according to its ETL functionalities and elementary functions in order to provide a fine-grained structure.
• This fine-grained ETL structure enables exploiting multiple distribution topologies.
• The parallel/distributed issues are considered at “process” level for coarse-grained distribution and “functionality” level for fine-grained distribution
摘要
Highlights•We design the Extracting-Transforming-Loading (ETL) process in Big Data environments.•Parallel/distributed ETL processes are designed according to the MapReduce paradigm.•We define the ETL according to its ETL functionalities and elementary functions in order to provide a fine-grained structure.•This fine-grained ETL structure enables exploiting multiple distribution topologies.•The parallel/distributed issues are considered at “process” level for coarse-grained distribution and “functionality” level for fine-grained distribution
论文关键词:Data Warehousing,ETL,Parallel and Distributed Processing,Big Data,MapReduce
论文评审过程:Received 21 June 2016, Revised 19 July 2017, Accepted 18 August 2017, Available online 26 August 2017, Version of Record 20 September 2017.
论文官网地址:https://doi.org/10.1016/j.datak.2017.08.003