LRScaf: improving draft genomes using long noisy reads

Abstract Background The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that...

Full description

Bibliographic Details
Main Authors: Mao Qin, Shigang Wu, Alun Li, Fengli Zhao, Hu Feng, Lulu Ding, Jue Ruan
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-019-6337-2
id doaj-1153d0e6447e4b31bad7e4c72040367c
record_format Article
spelling doaj-1153d0e6447e4b31bad7e4c72040367c2020-12-13T12:18:05ZengBMCBMC Genomics1471-21642019-12-0120111210.1186/s12864-019-6337-2LRScaf: improving draft genomes using long noisy readsMao Qin0Shigang Wu1Alun Li2Fengli Zhao3Hu Feng4Lulu Ding5Jue Ruan6Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesGuangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesAbstract Background The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. Results We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). Conclusions The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.https://doi.org/10.1186/s12864-019-6337-2LRScafScaffolding algorithmThird generation sequencing technologiesPacBioNanopore
collection DOAJ
language English
format Article
sources DOAJ
author Mao Qin
Shigang Wu
Alun Li
Fengli Zhao
Hu Feng
Lulu Ding
Jue Ruan
spellingShingle Mao Qin
Shigang Wu
Alun Li
Fengli Zhao
Hu Feng
Lulu Ding
Jue Ruan
LRScaf: improving draft genomes using long noisy reads
BMC Genomics
LRScaf
Scaffolding algorithm
Third generation sequencing technologies
PacBio
Nanopore
author_facet Mao Qin
Shigang Wu
Alun Li
Fengli Zhao
Hu Feng
Lulu Ding
Jue Ruan
author_sort Mao Qin
title LRScaf: improving draft genomes using long noisy reads
title_short LRScaf: improving draft genomes using long noisy reads
title_full LRScaf: improving draft genomes using long noisy reads
title_fullStr LRScaf: improving draft genomes using long noisy reads
title_full_unstemmed LRScaf: improving draft genomes using long noisy reads
title_sort lrscaf: improving draft genomes using long noisy reads
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2019-12-01
description Abstract Background The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. Results We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). Conclusions The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.
topic LRScaf
Scaffolding algorithm
Third generation sequencing technologies
PacBio
Nanopore
url https://doi.org/10.1186/s12864-019-6337-2
work_keys_str_mv AT maoqin lrscafimprovingdraftgenomesusinglongnoisyreads
AT shigangwu lrscafimprovingdraftgenomesusinglongnoisyreads
AT alunli lrscafimprovingdraftgenomesusinglongnoisyreads
AT fenglizhao lrscafimprovingdraftgenomesusinglongnoisyreads
AT hufeng lrscafimprovingdraftgenomesusinglongnoisyreads
AT luluding lrscafimprovingdraftgenomesusinglongnoisyreads
AT jueruan lrscafimprovingdraftgenomesusinglongnoisyreads
_version_ 1724384853327085568