Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base seq...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-01-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2019.01332/full |
id |
doaj-06b08ce13aa94c53a90f56ff6b5704e4 |
---|---|
record_format |
Article |
spelling |
doaj-06b08ce13aa94c53a90f56ff6b5704e42020-11-25T00:11:19ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-01-011010.3389/fgene.2019.01332494367Causalcall: Nanopore Basecalling Using a Temporal Convolutional NetworkJingwen Zeng0Hongmin Cai1Hong Peng2Haiyan Wang3Yue Zhang4Tatsuya Akutsu5School of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science, Guangdong Plytechnic Normal University, Guangzhou, ChinaBioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, JapanNanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly.https://www.frontiersin.org/article/10.3389/fgene.2019.01332/fullnanopore sequencingbasecallingdeep neural networktemporal convolutionperformance comparisonassembly |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jingwen Zeng Hongmin Cai Hong Peng Haiyan Wang Yue Zhang Tatsuya Akutsu |
spellingShingle |
Jingwen Zeng Hongmin Cai Hong Peng Haiyan Wang Yue Zhang Tatsuya Akutsu Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network Frontiers in Genetics nanopore sequencing basecalling deep neural network temporal convolution performance comparison assembly |
author_facet |
Jingwen Zeng Hongmin Cai Hong Peng Haiyan Wang Yue Zhang Tatsuya Akutsu |
author_sort |
Jingwen Zeng |
title |
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network |
title_short |
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network |
title_full |
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network |
title_fullStr |
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network |
title_full_unstemmed |
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network |
title_sort |
causalcall: nanopore basecalling using a temporal convolutional network |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Genetics |
issn |
1664-8021 |
publishDate |
2020-01-01 |
description |
Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly. |
topic |
nanopore sequencing basecalling deep neural network temporal convolution performance comparison assembly |
url |
https://www.frontiersin.org/article/10.3389/fgene.2019.01332/full |
work_keys_str_mv |
AT jingwenzeng causalcallnanoporebasecallingusingatemporalconvolutionalnetwork AT hongmincai causalcallnanoporebasecallingusingatemporalconvolutionalnetwork AT hongpeng causalcallnanoporebasecallingusingatemporalconvolutionalnetwork AT haiyanwang causalcallnanoporebasecallingusingatemporalconvolutionalnetwork AT yuezhang causalcallnanoporebasecallingusingatemporalconvolutionalnetwork AT tatsuyaakutsu causalcallnanoporebasecallingusingatemporalconvolutionalnetwork |
_version_ |
1725404654017183744 |