Classification of gene-expression data: The manifold-based metric learning way

作者:

Highlights:

摘要

Classification of microarray gene-expression data can potentially help medical diagnosis, and becomes an important topic in bioinformatics. However, microarray data sets are usually of small sample size relative to an overwhelming number of genes. This makes the classification problem fairly challenging. Instance-based learning (IBL) algorithms, such as nearest neighbor (k-NN), are usually the baseline algorithm due to their simplicity. However, practices show that k-NN performs not very well in this field. This paper introduces manifold-based metric learning to improve the performance of IBL methods. A novel metric learning algorithm is proposed by utilizing both local manifold structural information and local discriminant information. In addition, a random subspace extension is also presented. We apply the proposed algorithm to the gene-classification problem in three ways: one in the original feature space, another in the reduced feature space, and the third via the random subspace extension. Statistical evaluation shows that the proposed algorithm can achieve promising results, and gain significant performance improvement over traditional IBL algorithms.

论文关键词:Gene expression,Metric learning,Manifold learning,Nearest neighbor

论文评审过程:Received 29 June 2005, Revised 17 May 2006, Accepted 23 June 2006, Available online 20 July 2006.

论文官网地址:https://doi.org/10.1016/j.patcog.2006.05.026