An ensemble filter-based heuristic approach for cancerous gene expression classification
作者:
Highlights:
•
摘要
Gene expression data of cancer has a huge feature set size, making its categorization a challenge for the existing classification methods. It contains redundancy, noise, and irrelevant genes. Therefore, feature selection/reduction plays a crucial role in the classification of such gene expression datasets. This work presents an ensemble of three filter methods, namely, Symmetrical Uncertainty (SU), chi square (X2), and Relief to reduce the feature dimensions by eliminating redundant and noisy genes. The present work designs a novel heuristic called Local Search-based Feature Selection (LSFS) that further reduces noise generated by the ensemble method. The resulting selected features are then optimized using a genetic algorithm. Afterwards, the optimal set of features is classified using three models; Support Vector Machine (SVM), k-NN (k-nearest neighbor), and Random Forest (RF) to find cancer relevant genes. Experiments are conducted using six benchmark datasets. The obtained results are compared with five state-of-the-art algorithms based on accuracy, sensitivity, specificity, F-measure, entropy, and precision. Additional experiments are carried out by manipulating the SVM kernel as a fitness value as well as using multiple distance measures and various values of k for k-NN. Prediction accuracy of the proposed system on the six benchmark datasets is 99%, 90%, 98%, 94%, 98%, and 99%. Significant outcomes obtained from experimental analysis indicate that the proposed approach improves classification of cancerous gene expression data and can be used as a practical tool for the analysis of gene expression data.
论文关键词:Cancerous gene,Feature selection,Classification,Ensemble method,Evolutionary algorithm
论文评审过程:Received 4 February 2021, Revised 11 July 2021, Accepted 30 September 2021, Available online 6 October 2021, Version of Record 16 October 2021.
论文官网地址:https://doi.org/10.1016/j.knosys.2021.107560