A semi-supervised fuzzy clustering algorithm applied to gene expression data

作者:

Highlights:

摘要

Over the last decade there has been an increasing interest in semi-supervised clustering. Several studies have suggested that even a small amount of supervised information can significantly improve the results of unsupervised learning. One popular method of incorporating partial supervised information is through pair-wise constraints indicating whether a certain pair of patterns should belong to the same (Must-link) or different (Dont-link) clusters. In this study we propose a novel semi-supervised fuzzy clustering algorithm (SSFCA). The supervised information is incorporated via a method quantifying Must-link and/or Dont-link constraints. Additionally, we present an extension of SSFCA that allows the algorithm to automatically detect the number of clusters in the data. We apply SSFCA to the intrinsic problem of gene expression profiles clustering. The advantageous properties of fuzzy logic, inherited to SSFCA, allow genes to belong to more than one group, revealing this way more profound information concerning their multiple functioning roles. Finally, we investigate the incorporation of prior biological knowledge arriving from Gene Ontology in the process of selecting pair-wise constraints. Simulations on artificial and real life datasets proved that the proposed SSFCA significantly outperformed other standard and semi-supervised clustering methods.

论文关键词:Semi-supervised clustering,Pair-wise constraints,Fuzzy logic,Gene expression

论文评审过程:Received 18 February 2010, Revised 27 February 2011, Accepted 16 May 2011, Available online 26 May 2011.

论文官网地址:https://doi.org/10.1016/j.patcog.2011.05.007