6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine...

Full description

Bibliographic Details
Main Authors: Rao Zeng, Minghong Liao
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/16/7731
id doaj-faf799d689054d73b184bc57185936f9
record_format Article
spelling doaj-faf799d689054d73b184bc57185936f92021-08-26T13:31:10ZengMDPI AGApplied Sciences2076-34172021-08-01117731773110.3390/app111677316mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion MechanismRao Zeng0Minghong Liao1Department of Software Engineering, School of Informatics, Xiamen University, Xiamen 361005, ChinaDepartment of Software Engineering, School of Informatics, Xiamen University, Xiamen 361005, ChinaDNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: <i>Arabidopsis thaliana</i>, <i>Fragaria vesca</i>, <i>Rosa chinensis</i>, <i>Homo sapiens</i>, and <i>Drosophila melanogaster</i> with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.https://www.mdpi.com/2076-3417/11/16/7731DNA N6-methyladeninedeep learningsite predictiondepthwise separable convolutioninverted residual structureattention mechanism
collection DOAJ
language English
format Article
sources DOAJ
author Rao Zeng
Minghong Liao
spellingShingle Rao Zeng
Minghong Liao
6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
Applied Sciences
DNA N6-methyladenine
deep learning
site prediction
depthwise separable convolution
inverted residual structure
attention mechanism
author_facet Rao Zeng
Minghong Liao
author_sort Rao Zeng
title 6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
title_short 6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
title_full 6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
title_fullStr 6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
title_full_unstemmed 6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
title_sort 6mapred-msff: a deep learning model for predicting dna n6-methyladenine sites across species based on a multi-scale feature fusion mechanism
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2021-08-01
description DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: <i>Arabidopsis thaliana</i>, <i>Fragaria vesca</i>, <i>Rosa chinensis</i>, <i>Homo sapiens</i>, and <i>Drosophila melanogaster</i> with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.
topic DNA N6-methyladenine
deep learning
site prediction
depthwise separable convolution
inverted residual structure
attention mechanism
url https://www.mdpi.com/2076-3417/11/16/7731
work_keys_str_mv AT raozeng 6mapredmsffadeeplearningmodelforpredictingdnan6methyladeninesitesacrossspeciesbasedonamultiscalefeaturefusionmechanism
AT minghongliao 6mapredmsffadeeplearningmodelforpredictingdnan6methyladeninesitesacrossspeciesbasedonamultiscalefeaturefusionmechanism
_version_ 1721194982696550400