Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence
碩士 === 國立交通大學 === 生醫工程研究所 === 101 === Protein tyrosine sulfation is one of the common post-translation modifications. Identifying the tyrosine sulfation sites is important for biologists to predict biochemical interactions. However, the determinant features of tyrosine sulfation sites are unknown. M...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/30096393732610573630 |
id |
ndltd-TW-101NCTU5810114 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NCTU58101142016-07-02T04:20:16Z http://ndltd.ncl.edu.tw/handle/30096393732610573630 Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence 利用支援向量機及胺基酸成對位置權重矩陣特徵預測蛋白質中酪胺酸硫酸化位置 Huang, Po-Tsun 黃柏淳 碩士 國立交通大學 生醫工程研究所 101 Protein tyrosine sulfation is one of the common post-translation modifications. Identifying the tyrosine sulfation sites is important for biologists to predict biochemical interactions. However, the determinant features of tyrosine sulfation sites are unknown. Moreover, the number of experimental sulfotyrosine sites is few, and the number of non-sulfotyrosine sites is 26 times more than the number of sulfotyrosine sites. The thesis presents a prediction method based on support vector machine (SVM) with amino acid sequence encoded by pairwise position weighted matrix (PPWM) to predict tyrosine sulfation sites. Due to the number of sulfotyrosine sites are less than non-sulfotyrosine sites, we incorporates resampling of training data to build multiple SVM models. The final prediction is made by a voting mechanism from those models. A single SVM model achieves an accuracy of 99.2% in average under five-fold cross validation. The proposed method achieves an accuracy of 98.3% when testing all known tyrosine sites with voting. In addition, we discovered that some patterns such as acidic amino acid occurs on each side of tyrosine residue, and Tryptophan (W) couples with acidic amino acid occur more frequently within sulfotyrosine subsequence by analyzing PPWM. The results may help biologists to discover tyrosine sulfation. Ching, Yu-Tai 荊宇泰 2013 學位論文 ; thesis 33 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 生醫工程研究所 === 101 === Protein tyrosine sulfation is one of the common post-translation modifications. Identifying the tyrosine sulfation sites is important for biologists to predict biochemical interactions. However, the determinant features of tyrosine sulfation sites are unknown. Moreover, the number of experimental sulfotyrosine sites is few, and the number of non-sulfotyrosine sites is 26 times more than the number of sulfotyrosine sites. The thesis presents a prediction method based on support vector machine (SVM) with amino acid sequence encoded by pairwise position weighted matrix (PPWM) to predict tyrosine sulfation sites. Due to the number of sulfotyrosine sites are less than non-sulfotyrosine sites, we incorporates resampling of training data to build multiple SVM models. The final prediction is made by a voting mechanism from those models. A single SVM model achieves an accuracy of 99.2% in average under five-fold cross validation. The proposed method achieves an accuracy of 98.3% when testing all known tyrosine sites with voting. In addition, we discovered that some patterns such as acidic amino acid occurs on each side of tyrosine residue, and Tryptophan (W) couples with acidic amino acid occur more frequently within sulfotyrosine subsequence by analyzing PPWM. The results may help biologists to discover tyrosine sulfation.
|
author2 |
Ching, Yu-Tai |
author_facet |
Ching, Yu-Tai Huang, Po-Tsun 黃柏淳 |
author |
Huang, Po-Tsun 黃柏淳 |
spellingShingle |
Huang, Po-Tsun 黃柏淳 Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
author_sort |
Huang, Po-Tsun |
title |
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
title_short |
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
title_full |
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
title_fullStr |
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
title_full_unstemmed |
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence |
title_sort |
protein tyrosine sulfation sites predcition: based on support vector machine and pairwise position weighted matrix of amino acid sequence |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/30096393732610573630 |
work_keys_str_mv |
AT huangpotsun proteintyrosinesulfationsitespredcitionbasedonsupportvectormachineandpairwisepositionweightedmatrixofaminoacidsequence AT huángbǎichún proteintyrosinesulfationsitespredcitionbasedonsupportvectormachineandpairwisepositionweightedmatrixofaminoacidsequence AT huangpotsun lìyòngzhīyuánxiàngliàngjījíànjīsuānchéngduìwèizhìquánzhòngjǔzhèntèzhēngyùcèdànbáizhìzhōnglàoànsuānliúsuānhuàwèizhì AT huángbǎichún lìyòngzhīyuánxiàngliàngjījíànjīsuānchéngduìwèizhìquánzhòngjǔzhèntèzhēngyùcèdànbáizhìzhōnglàoànsuānliúsuānhuàwèizhì |
_version_ |
1718331604118536192 |