On the use of algebraic topology concepts to check the consistency of genome assembly

This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph usin...

Full description

Bibliographic Details
Main Author: Jean-François Gibrat
Format: Article
Language:English
Published: The Biophysical Society of Japan 2019-11-01
Series:Biophysics and Physicobiology
Subjects:
Online Access:https://doi.org/10.2142/biophysico.16.0_444
Description
Summary:This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph using algebraic topology concepts to determine, in advance, whether the assembly of the genome will be straightforward, i.e., whether it will lead to a pseudo-Hamiltonian path or cycle, or whether the results will need to be scrutinized. In the latter case, it will be necessary to look for “loops” in the OLC assembly graph caused by unresolved repeated genomic regions, and then try to untie the “knots” created by these regions.
ISSN:2189-4779