Predicting hub proteins using protein sequence, protein structure and physicochemical properties

碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning te...

Full description

Bibliographic Details
Main Authors:	Yen-Hong Chen, 陳彥宏
Other Authors:	Kuo-Bin Li
Format:	Others
Language:	zh-TW
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/6xbrh8

id	ndltd-TW-105YM005114050
record_format	oai_dc
spelling	ndltd-TW-105YM0051140502019-05-15T23:39:47Z http://ndltd.ncl.edu.tw/handle/6xbrh8 Predicting hub proteins using protein sequence, protein structure and physicochemical properties 使用蛋白質序列、結構與物理化學性質預測 Hub 蛋白質 Yen-Hong Chen 陳彥宏碩士國立陽明大學生物醫學資訊研究所 105 Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub. Kuo-Bin Li 李國彬 2017 學位論文 ; thesis 60 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 105 === Hub proteins are proteins with a large number of partners in a protein-protein interaction network. They are often regarded as potential drug targets for diseases such as cancers. This theses describes a hub protein prediction tool by using machine learning techniques, specifically the random forest that was implemented in the ‘caret’ R package. The training protein sequences were collected from the Human Protein Reference Database (HPRD). Proteins with ten or more interaction partners are labeled as the hub proteins, and those with exactly one interaction partner are labeled as the end proteins. Three types of feature sets were used in this study: (i) structure-based features supported by earlier studies, for example, the intrinsic disorder regions and the protein functional domains; (ii) sequence-based features, including amino acid composition, dipeptide composition and pseudo-amino acid composition; (iii) to incorporate information regarding amino acid’s physicochemical properties, the 20 amino acid compositions for any given protein are substituted by a single numerical value that can be considered as the sum of a specific amino acid physicochemical property (taken from the AAindex database) but weighted by the 20 composition values. The Random Forest Recursive Feature Elimination (RF-RFE) technique was used to select the optimal features from the combination of the three feature types. The final predictor is able to achieve a performance of 0.77 and 0.76 in terms of the areas under the receiver operating characteristic (ROC) curves using a 10-fold cross validation and an independent testing experiment, respectively. Furthermore, we are to demonstrate that the proposed hub protein predictor and the selected features indeed suggest new insights into the hub and end protein classification. Our prediction tool is freely accessible at http://bsaltools.ym.edu.tw/predHub.
author2	Kuo-Bin Li
author_facet	Kuo-Bin Li Yen-Hong Chen 陳彥宏
author	Yen-Hong Chen 陳彥宏
spellingShingle	Yen-Hong Chen 陳彥宏 Predicting hub proteins using protein sequence, protein structure and physicochemical properties
author_sort	Yen-Hong Chen
title	Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_short	Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_full	Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_fullStr	Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_full_unstemmed	Predicting hub proteins using protein sequence, protein structure and physicochemical properties
title_sort	predicting hub proteins using protein sequence, protein structure and physicochemical properties
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/6xbrh8
work_keys_str_mv	AT yenhongchen predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties AT chényànhóng predictinghubproteinsusingproteinsequenceproteinstructureandphysicochemicalproperties AT yenhongchen shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì AT chényànhóng shǐyòngdànbáizhìxùlièjiégòuyǔwùlǐhuàxuéxìngzhìyùcèhubdànbáizhì
_version_	1719152471784292352

Predicting hub proteins using protein sequence, protein structure and physicochemical properties

Similar Items