A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction

Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been incre...

Full description

Bibliographic Details
Main Authors: Xiuquan Du, Jiaxing Cheng, Tingting Zheng, Zheng Duan, Fulan Qian
Format: Article
Language:English
Published: MDPI AG 2014-07-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:http://www.mdpi.com/1422-0067/15/7/12731
id doaj-e4b76a7c5cdf41fb80d9c4af2af1e32c
record_format Article
spelling doaj-e4b76a7c5cdf41fb80d9c4af2af1e32c2020-11-25T00:08:02ZengMDPI AGInternational Journal of Molecular Sciences1422-00672014-07-01157127311274910.3390/ijms150712731ijms150712731A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction PredictionXiuquan Du0Jiaxing Cheng1Tingting Zheng2Zheng Duan3Fulan Qian4Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, ChinaInstitute of Information Engineering, Anhui Xinhua University, Hefei 230088, ChinaSchool of Mathematical Science, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaProtein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.http://www.mdpi.com/1422-0067/15/7/12731protein–protein interactionrandom forestensemble codingDX score
collection DOAJ
language English
format Article
sources DOAJ
author Xiuquan Du
Jiaxing Cheng
Tingting Zheng
Zheng Duan
Fulan Qian
spellingShingle Xiuquan Du
Jiaxing Cheng
Tingting Zheng
Zheng Duan
Fulan Qian
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
International Journal of Molecular Sciences
protein–protein interaction
random forest
ensemble coding
DX score
author_facet Xiuquan Du
Jiaxing Cheng
Tingting Zheng
Zheng Duan
Fulan Qian
author_sort Xiuquan Du
title A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
title_short A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
title_full A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
title_fullStr A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
title_full_unstemmed A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
title_sort novel feature extraction scheme with ensemble coding for protein–protein interaction prediction
publisher MDPI AG
series International Journal of Molecular Sciences
issn 1422-0067
publishDate 2014-07-01
description Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.
topic protein–protein interaction
random forest
ensemble coding
DX score
url http://www.mdpi.com/1422-0067/15/7/12731
work_keys_str_mv AT xiuquandu anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT jiaxingcheng anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT tingtingzheng anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT zhengduan anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT fulanqian anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT xiuquandu novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT jiaxingcheng novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT tingtingzheng novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT zhengduan novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
AT fulanqian novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction
_version_ 1725417130734649344