Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding a...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/13713044477546031025 |
id |
ndltd-TW-100YM005114051 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100YM0051140512015-10-13T21:22:40Z http://ndltd.ncl.edu.tw/handle/13713044477546031025 Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches 以機器學習法尋找有助於預測線形B細胞抗原決定位之特徵 Chun-Hung Su 蘇俊泓 博士 國立陽明大學 生物醫學資訊研究所 100 B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/. I-Fang Chung Ueng-Cheng Yang Nikhil R. Pal 鍾翊方 楊永正 Nikhil R. Pal 2012 學位論文 ; thesis 72 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/.
|
author2 |
I-Fang Chung |
author_facet |
I-Fang Chung Chun-Hung Su 蘇俊泓 |
author |
Chun-Hung Su 蘇俊泓 |
spellingShingle |
Chun-Hung Su 蘇俊泓 Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
author_sort |
Chun-Hung Su |
title |
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
title_short |
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
title_full |
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
title_fullStr |
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
title_full_unstemmed |
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches |
title_sort |
identification of useful features for predicting linear b-cell epitopes by machine-learning approaches |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/13713044477546031025 |
work_keys_str_mv |
AT chunhungsu identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches AT sūjùnhóng identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches AT chunhungsu yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng AT sūjùnhóng yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng |
_version_ |
1718062914113372160 |