Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs
<p>Abstract</p> <p>Background</p> <p>As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasing...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2008-02-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/9/101 |
id |
doaj-8d0bacbfd1d44ddbb1c6f2504d11e3a9 |
---|---|
record_format |
Article |
spelling |
doaj-8d0bacbfd1d44ddbb1c6f2504d11e3a92020-11-24T21:43:11ZengBMCBMC Bioinformatics1471-21052008-02-019110110.1186/1471-2105-9-101Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairsSheng Zhi-YaTang Yu-RongChen Yong-ZiZhang Ziding<p>Abstract</p> <p>Background</p> <p>As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins.</p> <p>Results</p> <p>A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of <it>k</it>-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors.</p> <p>Conclusion</p> <p>Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at <url>http://bioinformatics.cau.edu.cn/zzd_lab/CKSAAP_OGlySite/</url>.</p> http://www.biomedcentral.com/1471-2105/9/101 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sheng Zhi-Ya Tang Yu-Rong Chen Yong-Zi Zhang Ziding |
spellingShingle |
Sheng Zhi-Ya Tang Yu-Rong Chen Yong-Zi Zhang Ziding Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs BMC Bioinformatics |
author_facet |
Sheng Zhi-Ya Tang Yu-Rong Chen Yong-Zi Zhang Ziding |
author_sort |
Sheng Zhi-Ya |
title |
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
title_short |
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
title_full |
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
title_fullStr |
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
title_full_unstemmed |
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
title_sort |
prediction of mucin-type o-glycosylation sites in mammalian proteins using the composition of <it>k</it>-spaced amino acid pairs |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2008-02-01 |
description |
<p>Abstract</p> <p>Background</p> <p>As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins.</p> <p>Results</p> <p>A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of <it>k</it>-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors.</p> <p>Conclusion</p> <p>Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at <url>http://bioinformatics.cau.edu.cn/zzd_lab/CKSAAP_OGlySite/</url>.</p> |
url |
http://www.biomedcentral.com/1471-2105/9/101 |
work_keys_str_mv |
AT shengzhiya predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofitkitspacedaminoacidpairs AT tangyurong predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofitkitspacedaminoacidpairs AT chenyongzi predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofitkitspacedaminoacidpairs AT zhangziding predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofitkitspacedaminoacidpairs |
_version_ |
1725915095442128896 |