De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application

Abstract Background Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which coul...

Full description

Bibliographic Details
Main Authors: Wiktor Kuśmirek, Robert Nowak
Format: Article
Language:English
Published: BMC 2018-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2281-4
id doaj-399aca3eb37a41d7836dda9311f841cd
record_format Article
spelling doaj-399aca3eb37a41d7836dda9311f841cd2020-11-24T21:28:36ZengBMCBMC Bioinformatics1471-21052018-07-0119111010.1186/s12859-018-2281-4De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm applicationWiktor Kuśmirek0Robert Nowak1Institute of Computer Science, Warsaw University of TechnologyInstitute of Computer Science, Warsaw University of TechnologyAbstract Background Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be repeated many times. However, some of the tandem repeats are longer than the distance between the paired-end tags. Results We present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats. The main advantage of the presented algorithm is that long tandem repeats, which are much longer than maximum reads length and the insert size of paired-end tags can be properly restored. Moreover, repetitive DNA regions covered only by single-read sequencing data could also be restored. Other existing de novo DNA assemblers fail in such cases. The presented application is composed of several steps, including: (i) building the de Bruijn graph, (ii) correcting the de Bruijn graph, (iii) normalizing edge weights, and (iv) generating the output set of DNA sequences. We tested our approach on real data sets of bacterial organisms. Conclusions The software library, console application and web application were developed. Web application was developed in client-server architecture, where web-browser is used to communicate with end-user and algorithms are implemented in C++ and Python. The presented approach enables proper reconstruction of tandem repeats, which are longer than the insert size of paired-end tags. The application is freely available to all users under GNU Library or Lesser General Public License version 3.0 (LGPLv3).http://link.springer.com/article/10.1186/s12859-018-2281-4De novo assemblingDe Bruijn graphNext generation sequencingTandem repeats
collection DOAJ
language English
format Article
sources DOAJ
author Wiktor Kuśmirek
Robert Nowak
spellingShingle Wiktor Kuśmirek
Robert Nowak
De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
BMC Bioinformatics
De novo assembling
De Bruijn graph
Next generation sequencing
Tandem repeats
author_facet Wiktor Kuśmirek
Robert Nowak
author_sort Wiktor Kuśmirek
title De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
title_short De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
title_full De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
title_fullStr De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
title_full_unstemmed De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
title_sort de novo assembly of bacterial genomes with repetitive dna regions by dnaasm application
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-07-01
description Abstract Background Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be repeated many times. However, some of the tandem repeats are longer than the distance between the paired-end tags. Results We present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats. The main advantage of the presented algorithm is that long tandem repeats, which are much longer than maximum reads length and the insert size of paired-end tags can be properly restored. Moreover, repetitive DNA regions covered only by single-read sequencing data could also be restored. Other existing de novo DNA assemblers fail in such cases. The presented application is composed of several steps, including: (i) building the de Bruijn graph, (ii) correcting the de Bruijn graph, (iii) normalizing edge weights, and (iv) generating the output set of DNA sequences. We tested our approach on real data sets of bacterial organisms. Conclusions The software library, console application and web application were developed. Web application was developed in client-server architecture, where web-browser is used to communicate with end-user and algorithms are implemented in C++ and Python. The presented approach enables proper reconstruction of tandem repeats, which are longer than the insert size of paired-end tags. The application is freely available to all users under GNU Library or Lesser General Public License version 3.0 (LGPLv3).
topic De novo assembling
De Bruijn graph
Next generation sequencing
Tandem repeats
url http://link.springer.com/article/10.1186/s12859-018-2281-4
work_keys_str_mv AT wiktorkusmirek denovoassemblyofbacterialgenomeswithrepetitivednaregionsbydnaasmapplication
AT robertnowak denovoassemblyofbacterialgenomeswithrepetitivednaregionsbydnaasmapplication
_version_ 1725969480031404032