Improving SIM-based annotation method of protein sequence using support vector machine
碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 === The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation meth...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/05841332477652714545 |
id |
ndltd-TW-093NTUST392037 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093NTUST3920372015-12-25T04:10:27Z http://ndltd.ncl.edu.tw/handle/05841332477652714545 Improving SIM-based annotation method of protein sequence using support vector machine 以序列比對為基礎並應用分類器技術擷取並整合生物資訊來源之蛋白質序列註解系統 Cheng-Kang Liu 劉承剛 碩士 國立臺灣科技大學 資訊工程系 93 The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue. In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations. Hahn-Ming Lee 李漢銘 2005 學位論文 ; thesis 78 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 === The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue.
In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations.
|
author2 |
Hahn-Ming Lee |
author_facet |
Hahn-Ming Lee Cheng-Kang Liu 劉承剛 |
author |
Cheng-Kang Liu 劉承剛 |
spellingShingle |
Cheng-Kang Liu 劉承剛 Improving SIM-based annotation method of protein sequence using support vector machine |
author_sort |
Cheng-Kang Liu |
title |
Improving SIM-based annotation method of protein sequence using support vector machine |
title_short |
Improving SIM-based annotation method of protein sequence using support vector machine |
title_full |
Improving SIM-based annotation method of protein sequence using support vector machine |
title_fullStr |
Improving SIM-based annotation method of protein sequence using support vector machine |
title_full_unstemmed |
Improving SIM-based annotation method of protein sequence using support vector machine |
title_sort |
improving sim-based annotation method of protein sequence using support vector machine |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/05841332477652714545 |
work_keys_str_mv |
AT chengkangliu improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine AT liúchénggāng improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine AT chengkangliu yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng AT liúchénggāng yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng |
_version_ |
1718156767512231936 |