PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection

Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus....

Full description

Bibliographic Details
Main Authors: Chi-Chou Huang, Chi-Chang Chang, Chi-Wei Chen, Shao-yu Ho, Hsung-Pin Chang, Yen-Wei Chu
Format: Article
Language:English
Published: MDPI AG 2018-02-01
Series:Genes
Subjects:
Online Access:http://www.mdpi.com/2073-4425/9/2/91
id doaj-a7e17443c716481992f6b8604cbb409b
record_format Article
spelling doaj-a7e17443c716481992f6b8604cbb409b2020-11-24T23:41:36ZengMDPI AGGenes2073-44252018-02-01929110.3390/genes9020091genes9020091PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model SelectionChi-Chou Huang0Chi-Chang Chang1Chi-Wei Chen2Shao-yu Ho3Hsung-Pin Chang4Yen-Wei Chu5School of Medicine, Chung Shan Medical University, Taichung 40201, TaiwanSchool of Medical Informatics, Chung-Shan Medical University, Taichung 40201, TaiwanInstitute of Genomics and Bioinformatics, National Chung Hsing University, Kuo Kuang Rd., Taichung 402, TaiwanInstitute of Genomics and Bioinformatics, National Chung Hsing University, Kuo Kuang Rd., Taichung 402, TaiwanDepartment of Computer Science and Engineering, National Chung-Hsing University, Kuo Kuang Rd., Taichung 402, TaiwanInstitute of Genomics and Bioinformatics, National Chung Hsing University, Kuo Kuang Rd., Taichung 402, TaiwanProtein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/.http://www.mdpi.com/2073-4425/9/2/91protein quaternary structurebootstrap strategymodel selectionclassification
collection DOAJ
language English
format Article
sources DOAJ
author Chi-Chou Huang
Chi-Chang Chang
Chi-Wei Chen
Shao-yu Ho
Hsung-Pin Chang
Yen-Wei Chu
spellingShingle Chi-Chou Huang
Chi-Chang Chang
Chi-Wei Chen
Shao-yu Ho
Hsung-Pin Chang
Yen-Wei Chu
PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
Genes
protein quaternary structure
bootstrap strategy
model selection
classification
author_facet Chi-Chou Huang
Chi-Chang Chang
Chi-Wei Chen
Shao-yu Ho
Hsung-Pin Chang
Yen-Wei Chu
author_sort Chi-Chou Huang
title PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
title_short PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
title_full PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
title_fullStr PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
title_full_unstemmed PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
title_sort pclass: protein quaternary structure classification by using bootstrapping strategy as model selection
publisher MDPI AG
series Genes
issn 2073-4425
publishDate 2018-02-01
description Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/.
topic protein quaternary structure
bootstrap strategy
model selection
classification
url http://www.mdpi.com/2073-4425/9/2/91
work_keys_str_mv AT chichouhuang pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
AT chichangchang pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
AT chiweichen pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
AT shaoyuho pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
AT hsungpinchang pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
AT yenweichu pclassproteinquaternarystructureclassificationbyusingbootstrappingstrategyasmodelselection
_version_ 1725506425399017472