Tracking drifting concepts by minimizing disagreements

作者:David P. Helmbold, Philip M. Long

摘要

In this paper we consider the problem of tracking a subset of a domain (called thetarget) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy ε. Furthermore, the complexity of the classH of possible targets, as measured byd, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a classH can be approximated to within a factork, then there is a simple tracking algorithm forH which can achieve a probability ε of making a mistake if the target movement rate is at most a constant times ε2/(k(d +k) ln 1/ε), whered is the Vapnik-Chervonenkis dimension ofH. Also, we show that ifH is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm forH which tolerates drift rates up to a constant times ε2/(d 2 ln 1/ε). In addition, we prove complementary results for the classes of halfspaces and axisaligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ε2/d.

论文关键词:Computational learning theory, concept drift, concept learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/BF00993161