Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network

Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base seq...

Full description

Bibliographic Details
Main Authors: Jingwen Zeng, Hongmin Cai, Hong Peng, Haiyan Wang, Yue Zhang, Tatsuya Akutsu
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-01-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2019.01332/full
id doaj-06b08ce13aa94c53a90f56ff6b5704e4
record_format Article
spelling doaj-06b08ce13aa94c53a90f56ff6b5704e42020-11-25T00:11:19ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-01-011010.3389/fgene.2019.01332494367Causalcall: Nanopore Basecalling Using a Temporal Convolutional NetworkJingwen Zeng0Hongmin Cai1Hong Peng2Haiyan Wang3Yue Zhang4Tatsuya Akutsu5School of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou, ChinaSchool of Computer Science, Guangdong Plytechnic Normal University, Guangzhou, ChinaBioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, JapanNanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly.https://www.frontiersin.org/article/10.3389/fgene.2019.01332/fullnanopore sequencingbasecallingdeep neural networktemporal convolutionperformance comparisonassembly
collection DOAJ
language English
format Article
sources DOAJ
author Jingwen Zeng
Hongmin Cai
Hong Peng
Haiyan Wang
Yue Zhang
Tatsuya Akutsu
spellingShingle Jingwen Zeng
Hongmin Cai
Hong Peng
Haiyan Wang
Yue Zhang
Tatsuya Akutsu
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
Frontiers in Genetics
nanopore sequencing
basecalling
deep neural network
temporal convolution
performance comparison
assembly
author_facet Jingwen Zeng
Hongmin Cai
Hong Peng
Haiyan Wang
Yue Zhang
Tatsuya Akutsu
author_sort Jingwen Zeng
title Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
title_short Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
title_full Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
title_fullStr Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
title_full_unstemmed Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network
title_sort causalcall: nanopore basecalling using a temporal convolutional network
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-01-01
description Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly.
topic nanopore sequencing
basecalling
deep neural network
temporal convolution
performance comparison
assembly
url https://www.frontiersin.org/article/10.3389/fgene.2019.01332/full
work_keys_str_mv AT jingwenzeng causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
AT hongmincai causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
AT hongpeng causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
AT haiyanwang causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
AT yuezhang causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
AT tatsuyaakutsu causalcallnanoporebasecallingusingatemporalconvolutionalnetwork
_version_ 1725404654017183744