A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments

作者:

Highlights:

• We design the Extracting-Transforming-Loading (ETL) process in Big Data environments.

• Parallel/distributed ETL processes are designed according to the MapReduce paradigm.

• We define the ETL according to its ETL functionalities and elementary functions in order to provide a fine-grained structure.

• This fine-grained ETL structure enables exploiting multiple distribution topologies.

• The parallel/distributed issues are considered at “process” level for coarse-grained distribution and “functionality” level for fine-grained distribution

摘要

Highlights•We design the Extracting-Transforming-Loading (ETL) process in Big Data environments.•Parallel/distributed ETL processes are designed according to the MapReduce paradigm.•We define the ETL according to its ETL functionalities and elementary functions in order to provide a fine-grained structure.•This fine-grained ETL structure enables exploiting multiple distribution topologies.•The parallel/distributed issues are considered at “process” level for coarse-grained distribution and “functionality” level for fine-grained distribution

论文关键词:Data Warehousing,ETL,Parallel and Distributed Processing,Big Data,MapReduce

论文评审过程:Received 21 June 2016, Revised 19 July 2017, Accepted 18 August 2017, Available online 26 August 2017, Version of Record 20 September 2017.

论文官网地址:https://doi.org/10.1016/j.datak.2017.08.003