BIRCH: A New Data Clustering Algorithm and Its Applications

作者:Tian Zhang, Raghu Ramakrishnan, Miron Livny

摘要

Data clustering is an important technique for exploratory data analysis, and has been studied for several years. It has been shown to be useful in many practical domains such as data classification and image processing. Recently, there has been a growing emphasis on exploratory analysis of very large datasets to discover useful patterns and/or correlations among attributes. This is called data mining, and data clustering is regarded as a particular branch. However existing data clustering methods do not adequately address the problem of processing large datasets with a limited amount of resources (e.g., memory and cpu cycles). So as the dataset size increases, they do not scale up well in terms of memory requirement, running time, and result quality.

论文关键词:Very Large Databases, Data Clustering, Incremental Algorithm, Data Classification and Compression

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1009783824328