An online algorithm for scheduling big data analysis jobs in cloud environments

作者:

Highlights:

摘要

Cloud computing has become a popular platform for processing big data analysis jobs with its advantages of high-availability, elasticity and cost-efficiency. Many big data analysis service providers use cloud instances to process users’ big data analysis job execution requests and they need efficient scheduling algorithms to improve job execution efficiency and economic benefits. This paper presents a problem of minimizing the execution time of a batch of big data analysis jobs without changing the number of cloud instances. Solving this problem can not only improve big data job execution efficiency in cloud environments and user satisfaction, but also bring higher economic benefits to big data analysis service providers. This paper proposes an online scheduling algorithm, which can make full use of the parallelism of big data analysis jobs to optimize job scheduling decisions on the premise that the job execution time cannot be accurately known. For evaluating the performance of the proposed online scheduling algorithm, a traditional two-phase scheduling algorithm is introduced as a benchmark for comparison in this paper. Theoretical analysis and extensive simulation experiments based on real datasets show that the online scheduling algorithm proposed in this paper can achieve more stable performance compared with the benchmark two-phase scheduling algorithm.

论文关键词:Big data,Cloud computing,Job scheduling,Online algorithm

论文评审过程:Received 13 October 2021, Revised 16 March 2022, Accepted 18 March 2022, Available online 24 March 2022, Version of Record 31 March 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108628