Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency
作者:Ben Halstead, Yun Sing Koh, Patricia Riddle, Russel Pears, Mykola Pechenizkiy, Albert Bifet
摘要
A data stream is a sequence of observations produced by a generating process which may evolve over time. In such a time-varying stream the relationship between input features and labels, or concepts, can change. Adapting to changes in concept is most often done by destroying and incrementally rebuilding the current classifier. Many systems additionally store and reuse previously built models to more efficiently adapt when stream conditions drift to a previously seen state. Reusing a model offers increased classification performance over rebuilding, and provides an indicator, or transparency, into the hidden state of the generating process. When only a subset of past models can be stored for reuse, for example due to memory constraints, the choice of which models to store for optimal future reuse is an important problem. Current methods of evaluating which models to store use valuation policies such as age, time since last use, accuracy and diversity. These policies are often not optimal, losing predictive performance by undervaluing complex models. We propose a new valuation policy based on advantage, the misclassifications avoided by reusing a model rather than training a new model, which more accurately reflects the true value of model storage. We evaluate our method on synthetic and real world data, including a real world air pollution dataset. Our results show accuracy increases of up to 6% using our valuation policy, while preserving transparency.
论文关键词:Data streams, Memory management, Model valuation policy
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10618-021-00736-w