Impact of benign sample size on binary classification accuracy

作者:

Highlights:

• We propose a metric for accuracy degradation by increasing benign samples.

• Increasing the test benign sample size tenfold decreased the F1 score by 0.293.

• Using sufficient benign training samples mitigates accuracy degradation.

摘要

•We propose a metric for accuracy degradation by increasing benign samples.•Increasing the test benign sample size tenfold decreased the F1 score by 0.293.•Using sufficient benign training samples mitigates accuracy degradation.

论文关键词:Malware,Machine learning,Binary classification,Benign sample,Random forest,Support vector machine,XGBoost

论文评审过程:Received 25 November 2021, Revised 22 June 2022, Accepted 17 August 2022, Available online 27 August 2022, Version of Record 2 September 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118630