Summary: | This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph using algebraic topology concepts to determine, in advance, whether the assembly of the genome will be straightforward, i.e., whether it will lead to a pseudo-Hamiltonian path or cycle, or whether the results will need to be scrutinized. In the latter case, it will be necessary to look for “loops” in the OLC assembly graph caused by unresolved repeated genomic regions, and then try to untie the “knots” created by these regions.
|