Data clustering with size constraints

作者:

Highlights:

摘要

Data clustering is an important and frequently used unsupervised learning method. Recent research has demonstrated that incorporating instance-level background information to traditional clustering algorithms can increase the clustering performance. In this paper, we extend traditional clustering by introducing additional prior knowledge such as the size of each cluster. We propose a heuristic algorithm to transform size constrained clustering problems into integer linear programming problems. Experiments on both synthetic and UCI datasets demonstrate that our proposed approach can utilize cluster size constraints and lead to the improvement of clustering accuracy.

论文关键词:Constrained clustering,Size constraints,Linear programming,Data mining,Background knowledge

论文评审过程:Received 25 January 2010, Revised 29 April 2010, Accepted 13 June 2010, Available online 13 July 2010.

论文官网地址:https://doi.org/10.1016/j.knosys.2010.06.003