Kronos: towards bus contention-aware job scheduling in warehouse scale computers
作者:Shuai Xue, Shang Zhao, Quan Chen, Zhuo Song, Shanpei Chen, Tao Ma, Yong Yang, Wenli Zheng, Minyi Guo
摘要
While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.
论文关键词:bus contention, split lock, schedule, high perforance, cloud
论文评审过程:
论文官网地址:https://doi.org/10.1007/s11704-021-0418-5