Predicting Protein Subcellular Localization Using Integrative System
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 97 === The prediction of protein subcellular localization (PSL) has become a popular field in recent years because it can help protein function prediction and genome annotation, and thus aid the drug design. However, the experimental methods for analyzing PSL are often...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/46151991856883970059 |
id |
ndltd-TW-097NTOU5392029 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NTOU53920292016-04-27T04:11:50Z http://ndltd.ncl.edu.tw/handle/46151991856883970059 Predicting Protein Subcellular Localization Using Integrative System 利用整合式系統預測蛋白質細胞內定位 Wei-Jyun Li 李瑋峻 碩士 國立臺灣海洋大學 資訊工程學系 97 The prediction of protein subcellular localization (PSL) has become a popular field in recent years because it can help protein function prediction and genome annotation, and thus aid the drug design. However, the experimental methods for analyzing PSL are often expensive and time-consuming tasks. Therefore, the computational prediction of PSL, with the use of information in databases, has become a vibrant field of study. Nevertheless, it is still a tough task to extract suitable features from proteins for accurate prediction of PSL due to the complex structures of proteins. Consequently, for improving prediction performance on PSL problem, several modern PSL prediction systems apply multi-feature based protein descriptors and adopt hybrid complex prediction systems to classify and predict PSL. Even though, these systems possess outstanding prediction performance, few of them provide protein characteristics and bases of classification for further analysis. Therefore, in this thesis, a PSL prediction system, PSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR), which aims to provide more protein characteristics for analysis, is proposed. In PSL-PR-CPR system, proteins are encoded into feature vectors by using a protein descriptor, AAwindow, which uses Amino Acid Index (AAI) to describe proteins in a simple and easy-understood way. In order to derive a prediction model which has a high prediction performance, PSL-PR-CPR employs MG-PSO-DS, an evolutionary computation algorithm, for doing feature selection to select appropriate feature sets that are suitable for C4.5 classifier to classify and predict PSL. MG-PSO-DS is also applied to optimize C4.5 prediction performance by tuning C4.5 parameters. The PSL-PR-CPR displays C4.5 decision rules and provides protein features that assist protein analysis after constructing the prediction model. In addition, PSL-PR-CPR shows the characteristics of important features within amino acid sequence according to the easy-understood property of AAwindow for the purpose of providing more information for analysis reference. For prediction performance validation, two datasets were applied to compare the prediction performance of PSL-PR-CPR, Mycobacterial PSL predictor, Gpos-PLoc, CELLO and LocateP at the end of this thesis. The two datasets are 852 mycobacterial proteins from the study of Mycobacterial PSL predictor and 452 Gram-positive bacterial proteins from the study of Gpos-PLoc. The 5 fold cross validation and the 10 fold cross validation are used to validate PSL-PR-CPR performance on 852 mycobacterial proteins and 452 Gram-positive bacterial proteins, respectively. PSL-PR-CPR also provides samples of C4.5 decision rules, important features and characteristics within amino acid sequence. Eric Y. T. Juan 阮議聰 2009 學位論文 ; thesis 140 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 97 === The prediction of protein subcellular localization (PSL) has become a popular field in recent years because it can help protein function prediction and genome annotation, and thus aid the drug design. However, the experimental methods for analyzing PSL are often expensive and time-consuming tasks. Therefore, the computational prediction of PSL, with the use of information in databases, has become a vibrant field of study. Nevertheless, it is still a tough task to extract suitable features from proteins for accurate prediction of PSL due to the complex structures of proteins. Consequently, for improving prediction performance on PSL problem, several modern PSL prediction systems apply multi-feature based protein descriptors and adopt hybrid complex prediction systems to classify and predict PSL. Even though, these systems possess outstanding prediction performance, few of them provide protein characteristics and bases of classification for further analysis. Therefore, in this thesis, a PSL prediction system, PSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR), which aims to provide more protein characteristics for analysis, is proposed.
In PSL-PR-CPR system, proteins are encoded into feature vectors by using a protein descriptor, AAwindow, which uses Amino Acid Index (AAI) to describe proteins in a simple and easy-understood way. In order to derive a prediction model which has a high prediction performance, PSL-PR-CPR employs MG-PSO-DS, an evolutionary computation algorithm, for doing feature selection to select appropriate feature sets that are suitable for C4.5 classifier to classify and predict PSL. MG-PSO-DS is also applied to optimize C4.5 prediction performance by tuning C4.5 parameters. The PSL-PR-CPR displays C4.5 decision rules and provides protein features that assist protein analysis after constructing the prediction model. In addition, PSL-PR-CPR shows the characteristics of important features within amino acid sequence according to the easy-understood property of AAwindow for the purpose of providing more information for analysis reference. For prediction performance validation, two datasets were applied to compare the prediction performance of PSL-PR-CPR, Mycobacterial PSL predictor, Gpos-PLoc, CELLO and LocateP at the end of this thesis. The two datasets are 852 mycobacterial proteins from the study of Mycobacterial PSL predictor and 452 Gram-positive bacterial proteins from the study of Gpos-PLoc. The 5 fold cross validation and the 10 fold cross validation are used to validate PSL-PR-CPR performance on 852 mycobacterial proteins and 452 Gram-positive bacterial proteins, respectively. PSL-PR-CPR also provides samples of C4.5 decision rules, important features and characteristics within amino acid sequence.
|
author2 |
Eric Y. T. Juan |
author_facet |
Eric Y. T. Juan Wei-Jyun Li 李瑋峻 |
author |
Wei-Jyun Li 李瑋峻 |
spellingShingle |
Wei-Jyun Li 李瑋峻 Predicting Protein Subcellular Localization Using Integrative System |
author_sort |
Wei-Jyun Li |
title |
Predicting Protein Subcellular Localization Using Integrative System |
title_short |
Predicting Protein Subcellular Localization Using Integrative System |
title_full |
Predicting Protein Subcellular Localization Using Integrative System |
title_fullStr |
Predicting Protein Subcellular Localization Using Integrative System |
title_full_unstemmed |
Predicting Protein Subcellular Localization Using Integrative System |
title_sort |
predicting protein subcellular localization using integrative system |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/46151991856883970059 |
work_keys_str_mv |
AT weijyunli predictingproteinsubcellularlocalizationusingintegrativesystem AT lǐwěijùn predictingproteinsubcellularlocalizationusingintegrativesystem AT weijyunli lìyòngzhěnghéshìxìtǒngyùcèdànbáizhìxìbāonèidìngwèi AT lǐwěijùn lìyòngzhěnghéshìxìtǒngyùcèdànbáizhìxìbāonèidìngwèi |
_version_ |
1718249938568085504 |