Boosting support vector machines for imbalanced data sets

作者:Benjamin X. Wang, Nathalie Japkowicz

摘要

Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. We then counter the excessive bias introduced by this approach with a boosting algorithm. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.

论文关键词:Imbalanced data sets, Support vector machines, Boosting

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-009-0198-y