Online density estimation over high-dimensional stationary and non-stationary data streams
作者:
Highlights:
•
摘要
Efficient density estimation over an open-ended stream of high-dimensional data is of primary importance to machine learning. In general, parametric methods for density estimation are not suitable for high dimensions, and the widely used non-parametric methods like kernel density estimation (KDE) method fail for high-dimensional datasets. In this paper we present a framework for density estimation over stationary and non-stationary high-dimensional data streams. It is based on a blockized implementation of the Bayesian sequential partitioning (BSP) algorithm. The proposed framework satisfies the general design criteria for systems with the mission of online machine learning and data mining over data streams.
论文关键词:Multivariate density estimation,Non-parametric,High-dimensional datasets,Blockized Bayesian sequential partitioning,Streaming data mining,Non-stationary data streams,Kernel density estimation,KL divergence
论文评审过程:Received 20 November 2018, Revised 15 May 2019, Accepted 16 July 2019, Available online 22 July 2019, Version of Record 8 November 2019.
论文官网地址:https://doi.org/10.1016/j.datak.2019.101718