A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction
Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been incre...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2014-07-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | http://www.mdpi.com/1422-0067/15/7/12731 |
id |
doaj-e4b76a7c5cdf41fb80d9c4af2af1e32c |
---|---|
record_format |
Article |
spelling |
doaj-e4b76a7c5cdf41fb80d9c4af2af1e32c2020-11-25T00:08:02ZengMDPI AGInternational Journal of Molecular Sciences1422-00672014-07-01157127311274910.3390/ijms150712731ijms150712731A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction PredictionXiuquan Du0Jiaxing Cheng1Tingting Zheng2Zheng Duan3Fulan Qian4Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, ChinaInstitute of Information Engineering, Anhui Xinhua University, Hefei 230088, ChinaSchool of Mathematical Science, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaProtein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.http://www.mdpi.com/1422-0067/15/7/12731protein–protein interactionrandom forestensemble codingDX score |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiuquan Du Jiaxing Cheng Tingting Zheng Zheng Duan Fulan Qian |
spellingShingle |
Xiuquan Du Jiaxing Cheng Tingting Zheng Zheng Duan Fulan Qian A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction International Journal of Molecular Sciences protein–protein interaction random forest ensemble coding DX score |
author_facet |
Xiuquan Du Jiaxing Cheng Tingting Zheng Zheng Duan Fulan Qian |
author_sort |
Xiuquan Du |
title |
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction |
title_short |
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction |
title_full |
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction |
title_fullStr |
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction |
title_full_unstemmed |
A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction |
title_sort |
novel feature extraction scheme with ensemble coding for protein–protein interaction prediction |
publisher |
MDPI AG |
series |
International Journal of Molecular Sciences |
issn |
1422-0067 |
publishDate |
2014-07-01 |
description |
Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp. |
topic |
protein–protein interaction random forest ensemble coding DX score |
url |
http://www.mdpi.com/1422-0067/15/7/12731 |
work_keys_str_mv |
AT xiuquandu anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT jiaxingcheng anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT tingtingzheng anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT zhengduan anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT fulanqian anovelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT xiuquandu novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT jiaxingcheng novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT tingtingzheng novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT zhengduan novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction AT fulanqian novelfeatureextractionschemewithensemblecodingforproteinproteininteractionprediction |
_version_ |
1725417130734649344 |