Towards improving fuzzy clustering using support vector machine: Application to gene expression data

作者:

Highlights:

摘要

Recent advancement in microarray technology permits monitoring of the expression levels of a large set of genes across a number of time points simultaneously. For extracting knowledge from such huge volume of microarray gene expression data, computational analysis is required. Clustering is one of the important data mining tools for analyzing such microarray data to group similar genes into clusters. Researchers have proposed a number of clustering algorithms in this purpose. In this article, an attempt has been made in order to improve the performance of fuzzy clustering by combining it with support vector machine (SVM) classifier. A recently proposed real-coded variable string length genetic algorithm based clustering technique and an iterated version of fuzzy C-means clustering have been utilized in this purpose. The performance of the proposed clustering scheme has been compared with that of some well-known existing clustering algorithms and their SVM boosted versions for one simulated and six real life gene expression data sets. Statistical significance test based on analysis of variance (ANOVA) followed by posteriori Tukey–Kramer multiple comparison test has been conducted to establish the statistical significance of the superior performance of the proposed clustering scheme. Moreover biological significance of the clustering solutions have been established.

论文关键词:Microarray gene expression data,Fuzzy clustering,Cluster validity indices,Variable string length genetic algorithm,Support vector machines,Gene ontology

论文评审过程:Received 17 April 2008, Revised 10 April 2009, Accepted 26 April 2009, Available online 7 May 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.04.018