Improving SIM-based annotation method of protein sequence using support vector machine

碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 ===  The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation meth...

Full description

Bibliographic Details
Main Authors: Cheng-Kang Liu, 劉承剛
Other Authors: Hahn-Ming Lee
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/05841332477652714545
id ndltd-TW-093NTUST392037
record_format oai_dc
spelling ndltd-TW-093NTUST3920372015-12-25T04:10:27Z http://ndltd.ncl.edu.tw/handle/05841332477652714545 Improving SIM-based annotation method of protein sequence using support vector machine 以序列比對為基礎並應用分類器技術擷取並整合生物資訊來源之蛋白質序列註解系統 Cheng-Kang Liu 劉承剛 碩士 國立臺灣科技大學 資訊工程系 93  The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue. In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations. Hahn-Ming Lee 李漢銘 2005 學位論文 ; thesis 78 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 ===  The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue. In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations.
author2 Hahn-Ming Lee
author_facet Hahn-Ming Lee
Cheng-Kang Liu
劉承剛
author Cheng-Kang Liu
劉承剛
spellingShingle Cheng-Kang Liu
劉承剛
Improving SIM-based annotation method of protein sequence using support vector machine
author_sort Cheng-Kang Liu
title Improving SIM-based annotation method of protein sequence using support vector machine
title_short Improving SIM-based annotation method of protein sequence using support vector machine
title_full Improving SIM-based annotation method of protein sequence using support vector machine
title_fullStr Improving SIM-based annotation method of protein sequence using support vector machine
title_full_unstemmed Improving SIM-based annotation method of protein sequence using support vector machine
title_sort improving sim-based annotation method of protein sequence using support vector machine
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/05841332477652714545
work_keys_str_mv AT chengkangliu improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine
AT liúchénggāng improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine
AT chengkangliu yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng
AT liúchénggāng yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng
_version_ 1718156767512231936