Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches

博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding a...

Full description

Bibliographic Details
Main Authors:	Chun-Hung Su, 蘇俊泓
Other Authors:	I-Fang Chung
Format:	Others
Language:	en_US
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/13713044477546031025

id	ndltd-TW-100YM005114051
record_format	oai_dc
spelling	ndltd-TW-100YM0051140512015-10-13T21:22:40Z http://ndltd.ncl.edu.tw/handle/13713044477546031025 Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches 以機器學習法尋找有助於預測線形B細胞抗原決定位之特徵 Chun-Hung Su 蘇俊泓博士國立陽明大學生物醫學資訊研究所 100 B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/. I-Fang Chung Ueng-Cheng Yang Nikhil R. Pal 鍾翊方楊永正 Nikhil R. Pal 2012 學位論文 ; thesis 72 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立陽明大學 === 生物醫學資訊研究所 === 100 === B-cell epitopes are antigenic determinants, which are recognized and bound by B-cell receptors or antibodies. The synthetic peptide of linear B-cell epitope can help the development of peptide vaccines or can be used to induce the production of corresponding antibody. In the past three decades, lots studies used amino acid propensities to investigate their roles correlated with the location of linear B-cell epitope. However, they could achieve neither satisfied predicting performance nor large scale analysis to support their results. Although many machine-learning approaches have been applied in the prediction of linear B-cell epitope after the emergence of linear B-cell epitope databases, there is no any method which can treat a group of related information as a single entity and select useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. To solve the above problems, first, we applied a novel algorithm Group Feature Selecting Multilayered Perceptron (GFSMLP) with eight widely used amino acid propensities in four data sets. We used GFSMLP to rank propensities by the frequency with which they were selected. Then, we adopted k-means clustering to cluster the selected optimal amino acid propensity and used it to form the amino acid triplet. We calculated the difference of occurrence frequency of each triplet between positive and negative datasets and then combine the values with amino acid pairs’ values from a modified Chen’s AAP approach to encode the epitope sequences. We adopted both Support Vector Machine (SVM) and Random Forests classifiers in the classification process and used a two-level 5-fold cross-validation to find the optimal parameters for the classifiers to get the non-biased performance. Based on the results of GFSMLP, the selected propensities are indeed good features and show their stable performance in the different datasets to enhance the discriminating power for predicting linear B-cell epitopes. So far, our modified encoding approaches achieve the best predicting performance while comparing with the published researches. The accuracy (77.01%) has been raised about 6% in the prediction of linear B-cell epitopes. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/ GFSMLP/.
author2	I-Fang Chung
author_facet	I-Fang Chung Chun-Hung Su 蘇俊泓
author	Chun-Hung Su 蘇俊泓
spellingShingle	Chun-Hung Su 蘇俊泓 Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
author_sort	Chun-Hung Su
title	Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_short	Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_full	Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_fullStr	Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_full_unstemmed	Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches
title_sort	identification of useful features for predicting linear b-cell epitopes by machine-learning approaches
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/13713044477546031025
work_keys_str_mv	AT chunhungsu identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches AT sūjùnhóng identificationofusefulfeaturesforpredictinglinearbcellepitopesbymachinelearningapproaches AT chunhungsu yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng AT sūjùnhóng yǐjīqìxuéxífǎxúnzhǎoyǒuzhùyúyùcèxiànxíngbxìbāokàngyuánjuédìngwèizhītèzhēng
_version_	1718062914113372160

Identification of useful features for predicting linear B-cell epitopes by machine-learning approaches

Similar Items