Linking indexing data structures to de Bruijn graphs: Construction and update
作者:
Highlights:
•
摘要
DNA sequencing technologies have tremendously increased their throughput, and hence complicated DNA assembly. Numerous assembly programs use de Bruijn graphs (dBG) built from short reads to merge these into contigs, which represent putative DNA segments. In a dBG of order k, nodes are substrings of length k of reads (or k-mers), while arcs are their k+1-mers. As analysing reads often require to index all their substrings, it is interesting to exhibit algorithms that directly build a dBG from a pre-existing index, and especially a contracted dBG, where non-branching paths are condensed into single nodes. Here, we exhibit linear time algorithms for constructing the full or contracted dBGs from suffix trees, suffix arrays, and truncated suffix trees. With the latter the construction uses a space that is linear in the size of the dBG. Finally, we also provide algorithms to dynamically update the order of the graph without reconstructing it.
论文关键词:Index,Data structure,Suffix tree,Suffix array,Dynamic update,Overlap,Contracted de Bruijn graph,Assembly,Algorithms,Bioinformatics
论文评审过程:Received 25 June 2015, Revised 26 May 2016, Accepted 27 June 2016, Available online 19 July 2016, Version of Record 6 June 2019.
论文官网地址:https://doi.org/10.1016/j.jcss.2016.06.008