Stability-based validation of bicluster solutions

作者:

Highlights:

摘要

Bicluster analysis is an unsupervised learning method to detect homogeneous or uniquely characterized two-way subsets of objects and attributes from a data set. It is useful in finding groups that may not be found by the traditional cluster analysis and in interpreting the groups intuitively, especially for high-dimensional data sets. Because of these advantages, over the last few years, various biclustering algorithms have been developed and applied to bioinformatics and text mining area. However, research into validation of bicluster solutions is rare. We propose a new procedure of validating bicluster solutions by developing a stability index to measure the reproducibility of the solution under variation in the input data set. By generating random resample data sets from the input data set, obtaining bicluster solutions from them, and evaluating the expected agreement of the solutions to the bicluster solution for the original input data set, we quantify the stability of the bicluster solution. Experiments using three artificial data sets and two real gene expression data sets indicate that the proposed method is suitable to validate bicluster solutions.

论文关键词:Biclustering,Validation,Stability,Resampling

论文评审过程:Received 24 December 2009, Revised 22 July 2010, Accepted 25 August 2010, Available online 30 August 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.08.029