Predicting hub proteins using protein sequence, protein structure and physicochemical properties

碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning te...

Full description

Bibliographic Details
Main Authors: Yen-Hong Chen, 陳彥宏
Other Authors: Kuo-Bin Li
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/6xbrh8
id ndltd-TW-105YM005114050
record_format oai_dc
spelling ndltd-TW-105YM0051140502019-05-15T23:39:47Z http://ndltd.ncl.edu.tw/handle/6xbrh8 Predicting hub proteins using protein sequence, protein structure and physicochemical properties 使用蛋白質序列、結構與物理化學性質預測 Hub 蛋白質 Yen-Hong Chen 陳彥宏 碩士 國立陽明大學 生物醫學資訊研究所 105 Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub. Kuo-Bin Li 李國彬 2017 學位論文 ; thesis 60 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub.
author2 Kuo-Bin Li
author_facet Kuo-Bin Li
Yen-Hong Chen
陳彥宏
author Yen-Hong Chen
陳彥宏
spellingShingle Yen-Hong Chen
陳彥宏
Predicting hub proteins using protein sequence, protein structure and physicochemical properties
author_sort Yen-Hong Chen
title Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_short Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_full Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_fullStr Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_full_unstemmed Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_sort predicting hub proteins using protein sequence, protein structure and physicochemical properties
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/6xbrh8
work_keys_str_mv AT yenhongchen predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties
AT chényànhóng predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties
AT yenhongchen shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì
AT chényànhóng shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì
_version_ 1719152471784292352