Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information

DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations...

Full description

Bibliographic Details
Main Authors: Cong Shen, Yijie Ding, Jijun Tang, Jian Song, Fei Guo
Format: Article
Language:English
Published: MDPI AG 2017-11-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/22/12/2079
id doaj-69c044120101417e973e851d83e30676
record_format Article
spelling doaj-69c044120101417e973e851d83e306762020-11-25T00:17:04ZengMDPI AGMolecules1420-30492017-11-012212207910.3390/molecules22122079molecules22122079Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence InformationCong Shen0Yijie Ding1Jijun Tang2Jian Song3Fei Guo4School of Computer Science and Technology, Tianjin University, Tianjin 300350, ChinaSchool of Computer Science and Technology, Tianjin University, Tianjin 300350, ChinaSchool of Computer Science and Technology, Tianjin University, Tianjin 300350, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, ChinaSchool of Computer Science and Technology, Tianjin University, Tianjin 300350, ChinaDNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives M C C of 0.392 , 0.315 , 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. M C C for our method is increased by at least 0.053 , 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.https://www.mdpi.com/1420-3049/22/12/2079DNA–protein binding sitesensemble classifierfeature extractionrandom sub-samplingsparse representation model
collection DOAJ
language English
format Article
sources DOAJ
author Cong Shen
Yijie Ding
Jijun Tang
Jian Song
Fei Guo
spellingShingle Cong Shen
Yijie Ding
Jijun Tang
Jian Song
Fei Guo
Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
Molecules
DNA–protein binding sites
ensemble classifier
feature extraction
random sub-sampling
sparse representation model
author_facet Cong Shen
Yijie Ding
Jijun Tang
Jian Song
Fei Guo
author_sort Cong Shen
title Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
title_short Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
title_full Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
title_fullStr Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
title_full_unstemmed Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
title_sort identification of dna–protein binding sites through multi-scale local average blocks on sequence information
publisher MDPI AG
series Molecules
issn 1420-3049
publishDate 2017-11-01
description DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives M C C of 0.392 , 0.315 , 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. M C C for our method is increased by at least 0.053 , 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.
topic DNA–protein binding sites
ensemble classifier
feature extraction
random sub-sampling
sparse representation model
url https://www.mdpi.com/1420-3049/22/12/2079
work_keys_str_mv AT congshen identificationofdnaproteinbindingsitesthroughmultiscalelocalaverageblocksonsequenceinformation
AT yijieding identificationofdnaproteinbindingsitesthroughmultiscalelocalaverageblocksonsequenceinformation
AT jijuntang identificationofdnaproteinbindingsitesthroughmultiscalelocalaverageblocksonsequenceinformation
AT jiansong identificationofdnaproteinbindingsitesthroughmultiscalelocalaverageblocksonsequenceinformation
AT feiguo identificationofdnaproteinbindingsitesthroughmultiscalelocalaverageblocksonsequenceinformation
_version_ 1725381392998596608