Accumulating regional density dissimilarity for concept drift detection in data streams

作者:

Highlights:

• This paper develops a k-nearest neighbor-based data modeling method, which can divide sample sets into small subsets so that the entire feature space can be converted as a set of small regions.

• A regional drift oriented sample sets dissimilarity function is proposed which accumulates the density discrepancies among these small regions.

• This paper proves that the proposed dissimilarity follows a normal distribution, and develops a tailored significant test to estimate the significance of observed drifts.

• These contributions will be beneficial to solving real-world concept drift problems evidenced by experiments conducted.

摘要

•This paper develops a k-nearest neighbor-based data modeling method, which can divide sample sets into small subsets so that the entire feature space can be converted as a set of small regions.•A regional drift oriented sample sets dissimilarity function is proposed which accumulates the density discrepancies among these small regions.•This paper proves that the proposed dissimilarity follows a normal distribution, and develops a tailored significant test to estimate the significance of observed drifts.•These contributions will be beneficial to solving real-world concept drift problems evidenced by experiments conducted.

论文关键词:Concept drift,Dataset shift,Changing environments,Covariate shift,Empirical density estimation

论文评审过程:Received 13 June 2017, Revised 25 September 2017, Accepted 6 November 2017, Available online 7 November 2017, Version of Record 14 November 2017.

论文官网地址:https://doi.org/10.1016/j.patcog.2017.11.009