Predicting hub proteins using protein sequence, protein structure and physicochemical properties
碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning te...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/6xbrh8 |
id |
ndltd-TW-105YM005114050 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105YM0051140502019-05-15T23:39:47Z http://ndltd.ncl.edu.tw/handle/6xbrh8 Predicting hub proteins using protein sequence, protein structure and physicochemical properties 使用蛋白質序列、結構與物理化學性質預測 Hub 蛋白質 Yen-Hong Chen 陳彥宏 碩士 國立陽明大學 生物醫學資訊研究所 105 Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub. Kuo-Bin Li 李國彬 2017 學位論文 ; thesis 60 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub.
|
author2 |
Kuo-Bin Li |
author_facet |
Kuo-Bin Li Yen-Hong Chen 陳彥宏 |
author |
Yen-Hong Chen 陳彥宏 |
spellingShingle |
Yen-Hong Chen 陳彥宏 Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
author_sort |
Yen-Hong Chen |
title |
Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
title_short |
Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
title_full |
Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
title_fullStr |
Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
title_full_unstemmed |
Predicting hub proteins using protein sequence, protein structure and physicochemical properties |
title_sort |
predicting hub proteins using protein sequence, protein structure and physicochemical properties |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/6xbrh8 |
work_keys_str_mv |
AT yenhongchen predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties AT chényànhóng predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties AT yenhongchen shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì AT chényànhóng shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì |
_version_ |
1719152471784292352 |