A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

作者:Huifang Ma, Weizhong Zhao, Zhongzhi Shi

摘要

In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.

论文关键词:Nonnegative matrix factorization, Semi-supervised clustering, Dual constraints, Pair-wise constraints, Word-level constraints

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0560-3