A data mining approach to discover genetic and environmental factors involved in multifactorial diseases
作者:
Highlights:
•
摘要
In this paper, we are interested in discovering genetic and environmental factors that are involved in multifactorial diseases. Experiments have been achieved by the Biological Institute of Lille and many data has been generated. To exploit these data, data mining tools are required and we propose a two-phase optimisation approach using a specific genetic algorithm. During the first step, we select significant features with a specific genetic algorithm. Then, during the second step, we cluster affected individuals according to the features selected by the first phase. The paper describes the specificities of the genetic problem that we are studying, and presents in detail the genetic algorithm that we have developed to deal with this very large size feature selection problem. Results on both artificial and real data are presented.
论文关键词:Data mining,Clustering,Genetic algorithm,Feature selection,Multifactorial disease
论文评审过程:Received 16 March 2001, Revised 22 April 2001, Accepted 31 May 2001, Available online 23 February 2002.
论文官网地址:https://doi.org/10.1016/S0950-7051(01)00145-9