Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

作者:Yihao Zhang, Junhao Wen, Xibin Wang, Zhuo Jiang

摘要

Semi-supervised clustering aim to aid and bias the unsupervised clustering by employing a small amount of supervised information. The supervised information is generally given as pairwise constraints, which was used to either modify the objective function or to learn the distance measure. Many previous work have shown that the cluster algorithm based on distance metric is significantly better than the cluster algorithm based on probability distribution in the some data set, there are a totally opposite result in another data set, so how to balance the two methods become a key problem. In this paper, we proposed a semi-supervised hybrid clustering algorithm that provides a principled framework integrating distance metric into Gaussian mixture model, which consider not only the intrinsic geometry information but also the probability distribution information of the data. In comparison to only using the pairwise constraints, the labeled data was used to initialize Gaussian distribution parameter and to construct the weight matrix of regularizer, and then we adopt Kullback-Leibler Divergence as the “distance” measurement to regularize the objective function. Experiments on several UCI data sets and the real world data sets of Chinese Word Sense Induction demonstrate the effectiveness of our semi-supervised cluster algorithm.

论文关键词:Semi-supervised clustering, Gaussian mixture model, Distance metric learning, Expectation maximization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-013-0264-5