Determination of cluster number in clustering microarray data

作者:

Highlights:

摘要

The general purpose of clustering analysis of microarray data is to organize the data into meaningful groups based on their closeness. Although various algorithms have been proposed for the clustering of microarray data, the main difficulty remains to be the determination of the optimal number of clusters. To complicate the problem further, meaningful groups or closeness cannot be well defined due to the fuzziness nature of the data. This paper proposes a dynamic validity index to overcome this problem. The proposed index, in addition of the dynamic aspects, also takes care of both the intra- and the inter-distances of the clusters. An algorithm based on the proposed dynamic validity index and the traditional K-means method was developed. To make the proposed dynamic validity index more flexible, a modulating parameter γ is introduced. This parameter can be used to take care of noisy data and balance the importance between compactness and separateness in the clusters. To illustrate the effectiveness of the approach, a numerical example by using the human serum data from the literature was solved and the sensitivity and robustness of the approach are examined.

论文关键词:Clustering,Microarray data,K-means algorithm,Data mining,Validity index,Bio-informatics

论文评审过程:Available online 11 January 2005.

论文官网地址:https://doi.org/10.1016/j.amc.2004.10.076