Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches

博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding a...

Full description

Bibliographic Details
Main Authors: Chun-Hung Su, 蘇俊泓
Other Authors: I-Fang Chung
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/13713044477546031025
id ndltd-TW-100YM005114051
record_format oai_dc
spelling ndltd-TW-100YM0051140512015-10-13T21:22:40Z http://ndltd.ncl.edu.tw/handle/13713044477546031025 Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches 以機器學習法尋找有助於預測線形B細胞抗原決定位之特徵 Chun-Hung Su 蘇俊泓 博士 國立陽明大學 生物醫學資訊研究所 100 B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/. I-Fang Chung Ueng-Cheng Yang Nikhil R. Pal 鍾翊方 楊永正 Nikhil R. Pal 2012 學位論文 ; thesis 72 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/.
author2 I-Fang Chung
author_facet I-Fang Chung
Chun-Hung Su
蘇俊泓
author Chun-Hung Su
蘇俊泓
spellingShingle Chun-Hung Su
蘇俊泓
Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
author_sort Chun-Hung Su
title Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_short Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_full Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_fullStr Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_full_unstemmed Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_sort identification of useful features for predicting linear b-cell epitopes by machine-learning approaches
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/13713044477546031025
work_keys_str_mv AT chunhungsu identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches
AT sūjùnhóng identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches
AT chunhungsu yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng
AT sūjùnhóng yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng
_version_ 1718062914113372160