Prediction of software fault-prone classes using ensemble random forest with adaptive synthetic sampling algorithm

作者:A. Balaram, S. Vasundra

摘要

The process of predicting fault module in software is known as Software Fault Prediction (SFP) which is important for releasing software versions that are dependent on the predefined metrics due to historical faults in software. The fault prediction in software such as components, classes and modules, at an early stage in the development cycle, is important as it significantly contributes to time reduction and cost reduction. Therefore, the modules that are used for processing each step is reduced by the unnecessary efforts eliminated the faults during development process. However, the problem of imbalanced dataset becomes a significant challenge during SFP for software fault prediction at an early stage. The limitations such as inclusion of software metric for SFP models, cost effectiveness of the fault and the fault density prediction, are still few obstacles faced by research. The proposed Butterfly optimization performs feature selection that helps to predict meticulous and remarkable results by developing the applications of Machine Learning techniques. The present research uses Ensemble Random Forest with Adaptive Synthetic Sampling (E-RF-ADASYN) for fault prediction by using various classifiers which is mentioned in the proposed method section. The proposed E-RF-ADASYN obtained Area Under Curve (AUC) of 0.854767 better when compared with the existing method Rough-KNN Noise-Filtered Easy Ensemble (RKEE) of 0.771.

论文关键词:Adaptive synthetic sampling, Butterfly optimization, Ensemble random forest, Imbalanced data, Software fault prediction

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10515-021-00311-z