Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture

The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Alt...

Full description

Bibliographic Details
Main Authors: Siguo Wang, Qinhu Zhang, Zhen Shen, Ying He, Zhen-Heng Chen, Jianqiang Li, De-Shuang Huang
Format: Article
Language:English
Published: Elsevier 2021-06-01
Series:Molecular Therapy: Nucleic Acids
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2162253121000494
id doaj-5613196c730b47b7815fd2001efb451c
record_format Article
spelling doaj-5613196c730b47b7815fd2001efb451c2021-06-05T06:08:13ZengElsevierMolecular Therapy: Nucleic Acids2162-25312021-06-0124154163Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architectureSiguo Wang0Qinhu Zhang1Zhen Shen2Ying He3Zhen-Heng Chen4Jianqiang Li5De-Shuang Huang6The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, ChinaThe Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China; Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Siping Road 1239, Shanghai 200092, China; Corresponding author: Qinhu Zhang, The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China.School of Computer and Software, Nanyang Institute of Technology, Changjiang Road 80, Nanyang, Henan 473004, ChinaThe Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, ChinaCollege of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaThe Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China; Corresponding author: De-Shuang Huang, The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China.The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.http://www.sciencedirect.com/science/article/pii/S2162253121000494transcription factor binding sitesDNA sequenceDNA shape featureshybrid convolutional neural networkrecurrent neural network
collection DOAJ
language English
format Article
sources DOAJ
author Siguo Wang
Qinhu Zhang
Zhen Shen
Ying He
Zhen-Heng Chen
Jianqiang Li
De-Shuang Huang
spellingShingle Siguo Wang
Qinhu Zhang
Zhen Shen
Ying He
Zhen-Heng Chen
Jianqiang Li
De-Shuang Huang
Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
Molecular Therapy: Nucleic Acids
transcription factor binding sites
DNA sequence
DNA shape features
hybrid convolutional neural network
recurrent neural network
author_facet Siguo Wang
Qinhu Zhang
Zhen Shen
Ying He
Zhen-Heng Chen
Jianqiang Li
De-Shuang Huang
author_sort Siguo Wang
title Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
title_short Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
title_full Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
title_fullStr Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
title_full_unstemmed Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
title_sort predicting transcription factor binding sites using dna shape features based on shared hybrid deep learning architecture
publisher Elsevier
series Molecular Therapy: Nucleic Acids
issn 2162-2531
publishDate 2021-06-01
description The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.
topic transcription factor binding sites
DNA sequence
DNA shape features
hybrid convolutional neural network
recurrent neural network
url http://www.sciencedirect.com/science/article/pii/S2162253121000494
work_keys_str_mv AT siguowang predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT qinhuzhang predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT zhenshen predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT yinghe predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT zhenhengchen predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT jianqiangli predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
AT deshuanghuang predictingtranscriptionfactorbindingsitesusingdnashapefeaturesbasedonsharedhybriddeeplearningarchitecture
_version_ 1721396710637305856