C-CADZ: computational intelligence system for coronary artery disease detection using Z-Alizadeh Sani dataset

作者:Ankur Gupta, Rahul Kumar, Harkirat Singh Arora, Balasubramanian Raman

摘要

Coronary artery disease (CAD) is one of the most lethal diseases which is major cause of deaths around the globe. CAD is among such diseases with mortality rate approximately 7 million per annum. Though, early detection, prognostication and timely diagnosis can help in mortality rate reduction. Conventional CAD detection systems are cumbersome and expensive. Moreover, scarcity or uneven distribution of radiologists in different geographical locations is a hindrance in early diagnosis. Therefore, this is the time when researchers and doctors are collaboratively looking forward for developing a computational intelligence system in the area of medical imaging systems for prognostication, identification, treatment and disease diagnosis. To support the vision of researchers, a computational intelligence system for coronary artery disease diagnosis, C-CADZ, has been proposed. To validate the model, C-CADZ, the dataset namely, Z-Alizadeh Sani CAD dataset from UCI repository is considered. C-CADZ utilizes the fixed analysis of mixed data (FAMD) for feature extraction. FAMD extracts 96 features. In order to retrieve significant features, nature-inspired algorithms are utilized. C-CADZ implemented Synthetic Minority Oversampling Technique (SMOTE) to handle class-imbalanced data as machine learning (ML) predictive models are built to handle class-balanced datasets. Z-Score normalization technique is used for normalizing the dataset. Furthermore, C-CADZ is trained using ML classifiers, Random Forest (RF) and Extra Trees (ET) and validated using holdout validation scheme with hold-out ratio 3 : 1. Experimentation results show that C-CADZ outperforms state-of-the-art methods of last decades in terms of accuracy. C-CADZ has gained an increase in accuracy from state-of-the-art methods published in 2020 by 5.17% with performance metric 〈Acc, Sens, Spec〉≡〈97.37, 98.15, 95.45〉. The performance analysis shows that achieving highest accuracy and the stable nature of boxplot and ROC-AUC curve of RF-ET makes it suitable for heart disease prediction.

论文关键词:Feature extraction, Feature selection, Fixed analysis of mixed data (FAMD), Genetic algorithm (GA), SMOTE, Correlation, Eigenvalues, Eigenvectors

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02467-3