Heuristic attribute reduction and resource-saving algorithm for energy data of data centers
作者:Mincheng Chen, Jingling Yuan, Lin Li, Dongling Liu, Yang He
摘要
Energy data, which consist of energy consumption statistics and other related data in green data centers, grow dramatically. The energy data have great value, but many attributes within them are redundant and unnecessary, and they have a serious impact on the performance of the data center’s decision-making system. Thus, attribute reduction for the energy data has been conceived as a critical step. However, many existing attribute reduction algorithms are often computationally time-consuming. To address these issues, firstly, we extend the methodology of rough sets to construct data center energy consumption knowledge representation system. Energy data will occur some degree of exceptions caused by power failure, energy instability or other factors; hence, we design an integrated data preprocessing method using Spark for energy data, which mainly includes sampling analysis, data classification, missing data filling, outlier data prediction and data discretization. By taking good advantage of in-memory computing, a fast heuristic attribute reduction algorithm (FHARA-S) for energy data using Spark is proposed. In this algorithm, we use an efficient algorithm for transforming energy consumption decision table, a heuristic formula for measuring the significance of attribute to reduce the search space, and introduce the correlation between condition attribute and decision attribute, which further improve the computational efficiency. We also design an adaptive decision management architecture for the green data center based on FHARA-S, which can improve decision-making efficiency and strengthen energy management. The experimental results show the speed of our algorithm gains up to 2.2X performance improvement over the traditional attribute reduction algorithm using MapReduce and 0.61X performance improvement over the algorithm using Spark. Besides, our algorithm also saves more computational resources.
论文关键词:Energy data, Attribute reduction, Rough sets, Heuristic, Spark
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-018-1288-5