Evaluating nanopore sequencing data processing pipelines for structural variation identification

Abstract Background Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurat...

Full description

Bibliographic Details
Main Authors: Anbo Zhou, Timothy Lin, Jinchuan Xing
Format: Article
Language:English
Published: BMC 2019-11-01
Series:Genome Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13059-019-1858-1
id doaj-8afd97ea1aa04f2099d65bb957d8a002
record_format Article
spelling doaj-8afd97ea1aa04f2099d65bb957d8a0022020-11-25T04:10:01ZengBMCGenome Biology1474-760X2019-11-0120111310.1186/s13059-019-1858-1Evaluating nanopore sequencing data processing pipelines for structural variation identificationAnbo Zhou0Timothy Lin1Jinchuan Xing2Department of Genetics, Rutgers, the State University of New JerseyDepartment of Genetics, Rutgers, the State University of New JerseyDepartment of Genetics, Rutgers, the State University of New JerseyAbstract Background Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. Results Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers’ performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. Conclusions We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.http://link.springer.com/article/10.1186/s13059-019-1858-1Nanopore sequencingSingle-molecule sequencingStructural variationPipeline evaluation
collection DOAJ
language English
format Article
sources DOAJ
author Anbo Zhou
Timothy Lin
Jinchuan Xing
spellingShingle Anbo Zhou
Timothy Lin
Jinchuan Xing
Evaluating nanopore sequencing data processing pipelines for structural variation identification
Genome Biology
Nanopore sequencing
Single-molecule sequencing
Structural variation
Pipeline evaluation
author_facet Anbo Zhou
Timothy Lin
Jinchuan Xing
author_sort Anbo Zhou
title Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_short Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_full Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_fullStr Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_full_unstemmed Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_sort evaluating nanopore sequencing data processing pipelines for structural variation identification
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-11-01
description Abstract Background Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. Results Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers’ performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. Conclusions We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.
topic Nanopore sequencing
Single-molecule sequencing
Structural variation
Pipeline evaluation
url http://link.springer.com/article/10.1186/s13059-019-1858-1
work_keys_str_mv AT anbozhou evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification
AT timothylin evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification
AT jinchuanxing evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification
_version_ 1724420968466612224