Adaptive infinite dropout for noisy and sparse data streams

作者:Ha Nguyen, Hoang Pham, Son Nguyen, Ngo Van Linh, Khoat Than

摘要

The ability to analyze data streams, which arrive sequentially and possibly infinitely, is increasingly vital in various online applications. However, data streams pose various challenges, including sparse and noisy data as well as concept drifts, which easily mislead a learning method. This paper proposes a simple yet robust framework, called Adaptive Infinite Dropout (aiDropout), to effectively tackle these problems. Our framework uses a dropout technique in a recursive Bayesian approach in order to create a flexible mechanism for balancing between old and new information. In detail, the recursive Bayesian approach imposes a constraint on the model parameters to make a regularization term between the current and previous mini-batches. Then, dropout whose drop rate is autonomously learned can adjust the constraint to new data. Thanks to the ability to reduce overfitting and the ensemble property of Dropout, our framework obtains better generalization, thus it effectively handles undesirable effects of noise and sparsity. In particular, theoretical analyses show that aiDropout imposes a data-dependent regularization, therefore, it can adapt quickly to sudden changes from data streams. Extensive experiments show that aiDropout significantly outperforms the state-of-the-art baselines on a variety of tasks such as supervised and unsupervised learning.

论文关键词:Bayesian models, Data streams, Streaming learning, Dropout, Data-dependent regularization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-022-06169-w