Improving SIM-based annotation method of protein sequence using support vector machine

碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 === 　The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation meth...

Full description

Bibliographic Details
Main Authors:	Cheng-Kang Liu, 劉承剛
Other Authors:	Hahn-Ming Lee
Format:	Others
Language:	en_US
Published:	2005
Online Access:	http://ndltd.ncl.edu.tw/handle/05841332477652714545

id	ndltd-TW-093NTUST392037
record_format	oai_dc
spelling	ndltd-TW-093NTUST3920372015-12-25T04:10:27Z http://ndltd.ncl.edu.tw/handle/05841332477652714545 Improving SIM-based annotation method of protein sequence using support vector machine 以序列比對為基礎並應用分類器技術擷取並整合生物資訊來源之蛋白質序列註解系統 Cheng-Kang Liu 劉承剛碩士國立臺灣科技大學資訊工程系 93 　The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue. In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations. Hahn-Ming Lee 李漢銘 2005 學位論文 ; thesis 78 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊工程系 === 93 === 　The gap between the protein sequences and the reliable function annotation in public databases is growing. Traditional manual annotation by literature curation can not catch up with the rapid growth of new protein sequences. Thus, the automatic annotation methods of protein sequences are in great demand are in great demand. Sequence similarity (SIM) methods, such as BLAST, are the most common used method which searching for homologies and evolutionary relationship between the protein sequences. However, there are a considerable number of functional inconsistencies in similar protein sequences. Thus, a method to automatic eliminates the error annotations is needed to improve the SIM-based methods. In addition, the biological data are distributed in different databases and having their own data types. It is difficult for users to obtain these data they needed from the distributed environment. Integration of the various types of biological data into an integrated environment for function annotation of protein sequences is also an important issue. In this paper, we present a protein sequence annotation method, named as MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data and automatic eliminates the error annotations by a pre-trained SVM classifier. It assigns an annotation to the input protein sequence by taking into account all hit proteins with this annotation entirely, not only single hit protein. This can reduce the error annotations inferred from weak sequence similarity and the sequences identity in non-functional segment. The experimental results show that the error annotations can be eliminated effectively and keep high accuracy on different types of annotations.
author2	Hahn-Ming Lee
author_facet	Hahn-Ming Lee Cheng-Kang Liu 劉承剛
author	Cheng-Kang Liu 劉承剛
spellingShingle	Cheng-Kang Liu 劉承剛 Improving SIM-based annotation method of protein sequence using support vector machine
author_sort	Cheng-Kang Liu
title	Improving SIM-based annotation method of protein sequence using support vector machine
title_short	Improving SIM-based annotation method of protein sequence using support vector machine
title_full	Improving SIM-based annotation method of protein sequence using support vector machine
title_fullStr	Improving SIM-based annotation method of protein sequence using support vector machine
title_full_unstemmed	Improving SIM-based annotation method of protein sequence using support vector machine
title_sort	improving sim-based annotation method of protein sequence using support vector machine
publishDate	2005
url	http://ndltd.ncl.edu.tw/handle/05841332477652714545
work_keys_str_mv	AT chengkangliu improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine AT liúchénggāng improvingsimbasedannotationmethodofproteinsequenceusingsupportvectormachine AT chengkangliu yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng AT liúchénggāng yǐxùlièbǐduìwèijīchǔbìngyīngyòngfēnlèiqìjìshùxiéqǔbìngzhěnghéshēngwùzīxùnláiyuánzhīdànbáizhìxùlièzhùjiěxìtǒng
_version_	1718156767512231936

Improving SIM-based annotation method of protein sequence using support vector machine

Similar Items