ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

作者:Kai Huang, Xiaoguo Wang

摘要

Increasing the number of minority samples by data generation can effectively improve the performance of mining minority samples using a classifier in imbalanced problems. In this paper, we proposed an effective data generation algorithm for minority samples called the Adaptive Increase dimension of Variational AutoEncoder (ADA-INCVAE). Complementary to prior studies, a theoretical study is conducted from the perspective of multi-task learning to solve the posterior collapse for VAE. Afterward, by using the theoretical support, it proposed a novel training method by increasing the dimension of data to avoid the occurrence of posterior collapse. Aiming at restricting the range of synthetic data for different minority samples, an adaptive reconstruction loss weight is proposed according to the distance distribution of majority samples around the minority class samples. In the data generation stage, the generation proportion of different sample points is determined by the local information of the minority class. The experimental results based on 12 imbalanced datasets indicated that the algorithm could help the classifier to effectively improve F1-measure and G-mean, which verifies the effectiveness of synthetic data generated by ADA-INCVAE.

论文关键词:Machine learning, Imbalanced learning, Data generation, Variational AutoEncoder

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02566-1