Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time c...

Full description

Bibliographic Details
Main Authors: Jinjian Jiang, Nian Wang, Peng Chen, Chunhou Zheng, Bing Wang
Format: Article
Language:English
Published: MDPI AG 2017-07-01
Series:International Journal of Molecular Sciences
Subjects:
IBk
Online Access:https://www.mdpi.com/1422-0067/18/7/1543
id doaj-a5b367e9b519472a9ed393a5b8ac370f
record_format Article
spelling doaj-a5b367e9b519472a9ed393a5b8ac370f2020-11-25T00:16:18ZengMDPI AGInternational Journal of Molecular Sciences1422-00672017-07-01187154310.3390/ijms18071543ijms18071543Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble SystemJinjian Jiang0Nian Wang1Peng Chen2Chunhou Zheng3Bing Wang4School of Electronics and Information Engineering, Anhui University, Hefei 230601, ChinaSchool of Electronics and Information Engineering, Anhui University, Hefei 230601, ChinaInstitute of Health Sciences, Anhui University, Hefei 230601, ChinaSchool of Electronic Engineering & Automation, Anhui University, Hefei 230601, ChinaSchool of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan 243032, ChinaHotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.https://www.mdpi.com/1422-0067/18/7/1543random projectionhot spotsIBkensemble system
collection DOAJ
language English
format Article
sources DOAJ
author Jinjian Jiang
Nian Wang
Peng Chen
Chunhou Zheng
Bing Wang
spellingShingle Jinjian Jiang
Nian Wang
Peng Chen
Chunhou Zheng
Bing Wang
Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
International Journal of Molecular Sciences
random projection
hot spots
IBk
ensemble system
author_facet Jinjian Jiang
Nian Wang
Peng Chen
Chunhou Zheng
Bing Wang
author_sort Jinjian Jiang
title Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_short Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_full Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_fullStr Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_full_unstemmed Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_sort prediction of protein hotspots from whole protein sequences by a random projection ensemble system
publisher MDPI AG
series International Journal of Molecular Sciences
issn 1422-0067
publishDate 2017-07-01
description Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.
topic random projection
hot spots
IBk
ensemble system
url https://www.mdpi.com/1422-0067/18/7/1543
work_keys_str_mv AT jinjianjiang predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT nianwang predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT pengchen predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT chunhouzheng predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT bingwang predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
_version_ 1725383408099524608