Hyper-cylindrical micro-clustering for streaming data with unscheduled data removals
作者:
Highlights:
•
摘要
We present a streaming data clustering algorithm which allows removals of data records at any arbitrary time. Unlike other existing algorithms whose objective is to track evolving clusters by letting the weight of old data decline with time, we consider the case when all data records do not have pre-specified lifetime which is the characteristic of many real data sets such as bank accounts. A data record can be added or removed by users at any arbitrary time. The algorithm processes each datum in one-pass-throw-away fashion without storing the whole data set. A technique for merging several micro-clusters into a hyper-cylindrical micro-cluster is proposed to reduce the number of micro-clusters in feature space, and thus reduce computation. The performance of this algorithm is tested with several data sets including both synthetic and real data sets. The proposed algorithm shows better performances compared with other state-of-the-art algorithms in terms of several indices for measuring clustering performance.
论文关键词:Data stream,Density-based clustering,Hyper-cylindrical micro-cluster,Unscheduled data removal,Data clustering
论文评审过程:Received 8 July 2015, Revised 22 January 2016, Accepted 6 February 2016, Available online 27 February 2016, Version of Record 18 March 2016.
论文官网地址:https://doi.org/10.1016/j.knosys.2016.02.004