Mining DNA sequences to predict sites which mutations cause genetic diseases

作者:

Highlights:

摘要

Currently single nucleotide polymorphism (SNP) analysis becomes the crossroad of bioinformatics and medicine. We have developed a data mining system, http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/, called rSNP_Guide, to discover regulatory sites in DNA sequences, which mutations could be the cause of genetic diseases. During the first step, we estimate the abilities of the proteins considered to bind to genomic DNA, which alterations by mutations are associated with a genetic disease under study. During the second step, we formalize the disease-associated experimental data on the SNP-referred alterations in DNA binding to unknown protein. During the third step, we cluster fuzzily all known proteins examined so that to determine one of them, which specific site is altered by mutations in consistence with that of the unknown protein experimentally associated with genetic disease. During the fourth step, we predict the known protein, which binding site is (i) resent on DNA and (ii) altered by mutations associated with genetic disease. Finally, during the last step, we estimate the robustness of this prediction. The rSNP_Guide has been tested on the SNPs with the known relationships between regulatory site alterations and genetic disease penetration. Besides, the novel SNPs-referred regulatory sites associated with the genetic disease penetrations were discovered and, then, successfully confirmed experimentally.

论文关键词:Data mining,Single nucleotide polymorphism,Regulatory site,Mutation,Genetic disease

论文评审过程:Received 15 March 2001, Revised 2 May 2001, Accepted 31 May 2001, Available online 23 February 2002.

论文官网地址:https://doi.org/10.1016/S0950-7051(01)00144-7