Biclustering of gene expression data based on related genes and conditions extraction

作者:

Highlights:

摘要

Biclustering is an important tool to find patterns in a microarray data matrix by simultaneous classification in two dimensions of genes and conditions. Unlike most existed biclustering algorithms where almost all genes and conditions are involved in the clustering process even if they contribute little to a bicluster, we propose to perform the biclustering operation only in related genes and conditions of a given bicluster type. In our algorithm, the gene expression matrix is first partitioned to stable and unstable submatrices in both row and column directions by inspecting the similarity between the row (or column) vector and the full 1s vector, then the related genes and conditions of a given type of biclusters are extracted by inspecting the row or column pairs in the corresponding stable or unstable submatrices, finally the resulted biclusters of any type are obtained by performing clustering analysis in the extracted related genes and conditions. Additionally, a novel strategy for estimating the missing data in the gene expression matrix is also presented based on the James–Stein and kernel estimation principle where the estimation matrix is obtained with the k means algorithm. Experimental results show excellent performance of our algorithm both in missing data estimation and biclustering.

论文关键词:Biclustering,Microarray,Gene expression data,Missing data estimation

论文评审过程:Received 14 February 2012, Revised 30 August 2012, Accepted 30 September 2012, Available online 25 October 2012.

论文官网地址:https://doi.org/10.1016/j.patcog.2012.09.028