Preprocessing unbalanced data using support vector machine
作者:
Highlights:
•
摘要
This paper deals with the application of support vector machine (SVM) to deal with the class imbalance problem. The objective of this paper is to examine the feasibility and efficiency of SVM as a preprocessor. Our study analyzes different classification algorithms that are employed to predict the customers with caravan car policy based on his/her sociodemographic data and history of product ownership. A series of experiments was conducted to test various computational intelligence techniques viz., Multilayer Perceptron (MLP), Logistic Regression (LR), and Random Forest (RF). Various standard balancing techniques such as under-sampling, over-sampling and Synthetic Minority Over-sampling TEchnique (SMOTE) are also employed. Subsequently, a strategy of data balancing for handling imbalanced distribution in data is proposed. The proposed approach first employs SVM as a preprocessor and the actual target values of training data are then replaced by the predictions of trained SVM. Later, this modified training data is used to train techniques such as MLP, LR, and RF. Based on the measure of sensitivity, it is observed that the proposed approach not only balances the data effectively but also provides more number of instances for minority class, which in turn enhances the performance of the intelligence techniques.
论文关键词:Hybrid method,Preprocessor,SVM,Unbalanced data,COIL data
论文评审过程:Received 20 October 2011, Revised 14 December 2011, Accepted 30 January 2012, Available online 14 February 2012.
论文官网地址:https://doi.org/10.1016/j.dss.2012.01.016