A novel locally guided genome reassembling technique using an artificial ant system

作者:Susobhan Baidya, Rajat Kumar De

摘要

DNA reassembling is an NP-hard problem (Brun, Theor Comput Sci 395:31–46, 2008; Medvedev et al 2007; Ma and Lombardi 2008). The present article presents a locally guided global learning system to solve the problem of genome reassembling. We have used a reference DNA sequence which is 99 % similar to an unknown DNA sequence. Two different sequences from the same organism generally have around 99 % similarity (Wei et al 2007). We have considered different DNA sequences from NCBI website (http://www.ncbi.nlm.nih.gov). Then we have simulated the tasks of cloning the sequence, followed by shearing the clones to a number of short reads. In our algorithm, we have introduced a new concept in the task of DNA reassembling using Ant Colony Optimization, where pheromone concentration is proportional to the score of assembled DNA fragments with some known reference sequences within the same organism. Unlike local overlapping, we have used here local alignment score of short reads with some known local reference region as the heuristic information. The result shows that our algorithm is capable of reassembling at par with the state-of-the-art. DNA reassembling techniques may need a massive parallel computation and huge memory space (Kurniawan et al 2008) because of size ~109bp of DNA sequences of mammals (Miller et al, Genomics 95:315–327, 2010; Blazewicz et al, Comput Biol Chem 33:224–230, 2009; Butler et al, Genome Res 18:810–820, 2008; Joshi et al 2011; Stupar et al, Arch Oncol 19:3–4, 2011; Quail et al, BMC Genomics 13:1471–2164, 2012), and ACO is inherently concurrent in nature (Dorigo and Stutzle 2004). Due to lack of appropriate computational resources, we had to confine ourselves to deal with the sequences of length up to ∼105 b p. We have considered 22 sequences of different organism, including Homo sapiens BRCA1 (127429bp) gene. For large sequences, we have applied hierarchical BAC-by-BAC sequencing (Fig. 2) (Myers, Comput Sci Eng 1:33–43, 1999), to stitch the individual segments to retrieve the original DNA sequence.

论文关键词:ACO, NP hard problem, DNA, Short reads, Hierarchical BAC-by-BAC sequencing

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-015-0650-5