A dynamic classification unit for online segmentation of big data via small data buffers

作者:

Highlights:

• A method for dynamic segmentation of big data with no a priori classification.

• No need for reexamination of past data when a new segment is defined.

• Segments are represented by selected representative cases stored in small data buffers.

• The dynamic classification unit can serve as an autonomous agent.

• Significant advantage in terms of performance while maintaining accuracy.

摘要

In many segmentation processes, we assign new cases according to a model that was built on the basis of past cases. As long as the new cases are “similar enough” to the past cases, segmentation proceeds normally. However, when a new case is substantially different from the known cases, a reexamination of the previously created segments is required. The reexamination may result in the creation of new segments or in the updating of the existing ones. In this paper, we assume that in big and dynamic data environments it is not possible to reexamine all past data and, therefore, we suggest using small groups of selected cases, stored in small data buffers, as an alternative to the collection of all past data. We present an incremental dynamic classifier that supports real-time unsupervised segmentation in big and dynamic data environments. In order to reduce the computational effort of unsupervised clustering in such environments, the suggested model performs calculations only on the relevant data buffers that store the relevant representative cases. In addition, the suggested model can serve as a dynamic classification unit (DCU) that can act as an autonomous agent, as well as collaborate with other DCUs. The evaluation is presented by comparing three approaches: static, dynamic, and incremental dynamic.

论文关键词:Incremental dynamic classifier,Dynamic segmentation,Incremental data analysis,Cluster analysis,Classification,Big data

论文评审过程:Received 25 April 2019, Revised 14 August 2019, Accepted 4 September 2019, Available online 7 September 2019, Version of Record 16 November 2019.

论文官网地址:https://doi.org/10.1016/j.dss.2019.113157