A fuzzy intelligent approach to the classification problem in gene expression data analysis

作者:

Highlights:

摘要

Classification is an important data mining task that widely used in several different real world applications. In microarray analysis, classification techniques are applied in order to discriminate diseases or to predict outcomes based on gene expression patterns, and perhaps even to identify the best treatment for given genetic signature. The most important challenge in gene expression data analysis lies in how to deal with its unique “high dimension small sample” characteristic, which makes many traditional classification techniques non-applicable or inefficient; and hence, more dedicated techniques are nowadays needed in order to approach this problem. Fuzzy logic is recently shown that is a powerful and suitable soft computing tool for handling the complex problems under incomplete data conditions. In this paper, a new hybrid model is proposed that combines artificial intelligence with fuzzy in order to benefit from unique advantages of both fuzzy logic and the classification power of the artificial neural networks (ANNs), to construct an efficient and accurate hybrid classifier in less available data situations. The proposed model, because of using the fuzzy parameters instead of the crisp parameters, will need less data set in comparing with traditional nonfuzzy neural networks in its training process or with same training sample can better learn and hence can yield more accurate results than traditional neural networks. In addition of theoretical evidence of using fuzzy logic, empirical results of gene expression classification indicate that the proposed model exhibits effectively improved classification accuracy in comparison with traditional artificial neural networks (ANNs) and also some other well-known statistical and intelligent classification models such as the linear discriminant analysis (LDA), the quadratic discriminant analysis (QDA), the K-nearest neighbor (KNN), and the support vector machines (SVMs). Therefore, the proposed model can be applied as an appropriate alternate approach for solving problems with scant data such as gene expression data classification, specifically when higher classification accuracy is needed.

论文关键词:Pattern recognition,Classification,Fuzzy logic,Artificial neural networks (ANNs),Discriminant analysis (DA),K-nearest neighbor (KNN),Support vector machines (SVMs),Gene expression

论文评审过程:Received 24 May 2011, Revised 22 October 2011, Accepted 22 October 2011, Available online 29 October 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.10.012