A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
Abstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neur...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Publishing Group
2021-05-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-021-89850-9 |
id |
doaj-8b9fbe0d9f8d4fb8bbcb8712e9e6303b |
---|---|
record_format |
Article |
spelling |
doaj-8b9fbe0d9f8d4fb8bbcb8712e9e6303b2021-05-16T11:23:43ZengNature Publishing GroupScientific Reports2045-23222021-05-0111111310.1038/s41598-021-89850-9A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genomeChowdhury Rafeed Rahman0Ruhul Amin1Swakkhar Shatabda2Md. Sadrul Islam Toaha3United International UniversityUnited International UniversityUnited International UniversityUnited International UniversityAbstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR .https://doi.org/10.1038/s41598-021-89850-9 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chowdhury Rafeed Rahman Ruhul Amin Swakkhar Shatabda Md. Sadrul Islam Toaha |
spellingShingle |
Chowdhury Rafeed Rahman Ruhul Amin Swakkhar Shatabda Md. Sadrul Islam Toaha A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome Scientific Reports |
author_facet |
Chowdhury Rafeed Rahman Ruhul Amin Swakkhar Shatabda Md. Sadrul Islam Toaha |
author_sort |
Chowdhury Rafeed Rahman |
title |
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome |
title_short |
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome |
title_full |
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome |
title_fullStr |
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome |
title_full_unstemmed |
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome |
title_sort |
convolution based computational approach towards dna n6-methyladenine site identification and motif extraction in rice genome |
publisher |
Nature Publishing Group |
series |
Scientific Reports |
issn |
2045-2322 |
publishDate |
2021-05-01 |
description |
Abstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR . |
url |
https://doi.org/10.1038/s41598-021-89850-9 |
work_keys_str_mv |
AT chowdhuryrafeedrahman aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT ruhulamin aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT swakkharshatabda aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT mdsadrulislamtoaha aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT chowdhuryrafeedrahman convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT ruhulamin convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT swakkharshatabda convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome AT mdsadrulislamtoaha convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome |
_version_ |
1721439496518500352 |