Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection

作者:Gehad Ismail Sayed, Alaa Tharwat, Aboul Ella Hassanien

摘要

Selecting the most discriminative features is a challenging problem in many applications. Bio-inspired optimization algorithms have been widely applied to solve many optimization problems including the feature selection problem. In this paper, the most discriminating features were selected by a new Chaotic Dragonfly Algorithm (CDA) where chaotic maps embedded with searching iterations of the Dragonfly Algorithm (DA). Ten chaotic maps were employed to adjust the main parameters of dragonflies’ movements through the optimization process to accelerate the convergence rate and improve the efficiency of DA. The proposed algorithm is employed for selecting features from the dataset that were extracted from the Drug bank database, which contained 6712 drugs. In this paper, 553 drugs that were bio-transformed into liver are used. This data have four toxic effects, namely, irritant, mutagenic, reproductive, and tumorigenic effect, where each drug is represented by 31 chemical descriptors. The proposed model is mainly comprised of three phases; data pre-processing, features selection, and the classification phase. In the data pre-processing phase, Synthetic Minority Over-sampling Technique (SMOTE) was used to solve the problem of the imbalanced dataset. At the features selection phase, the most discriminating features were selected using CDA. Finally, the selected features from CDA were used to feed Support Vector Machine (SVM) classifier at the classification phase. Experimental results proved the capability of CDA to find the optimal feature subset, which maximizing the classification performance and minimizing the number of selected features compared with DA and the other meta-heuristic optimization algorithms. Moreover, the experiments showed that Gauss chaotic map was the appropriate map to significantly boost the performance of DA. Additionally, the high obtained value of accuracy (81.82–96.08%), recall (80.84–96.11%), precision (81.45–96.08%) and F-Score (81.14–96.1%) for all toxic effects proved the robustness of the proposed model.

论文关键词:Toxic effects, Dragonfly algorithm, Feature selection, Optimization algorithm, Chaos theory

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-018-1261-8