An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques

碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successf...

Full description

Bibliographic Details
Main Authors: Guo-Hsing Lee, 李果興
Other Authors: Shian-Hua Lin
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/76995321504438438278
id ndltd-TW-093NCNU0114002
record_format oai_dc
spelling ndltd-TW-093NCNU01140022016-06-08T04:13:34Z http://ndltd.ncl.edu.tw/handle/76995321504438438278 An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques 以搜尋引擎技術為基礎之高效能蛋白質序列檢索系統 Guo-Hsing Lee 李果興 碩士 國立暨南國際大學 生物醫學科技研究所 93 Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successfully proved by billions users in perspectives of performance and scalability. Based on these experiences, we apply information retrieval and search engine methods to carry out an index-based homology search system, Protein Sequence Search Engine (PSSE). By proposing novel term-extraction and term-weighting approaches, we make the retrieval of protein sequences become efficient and effective. Experiments show that PSSE is slightly more accurate and 3 times faster than the default setting BLAST while searching protein sequences. In comparison with the most sensitive BLAST, PSSE is over 36 times faster by losing no more than 1% accuracy. Shian-Hua Lin 林宣華 2005 學位論文 ; thesis 61 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successfully proved by billions users in perspectives of performance and scalability. Based on these experiences, we apply information retrieval and search engine methods to carry out an index-based homology search system, Protein Sequence Search Engine (PSSE). By proposing novel term-extraction and term-weighting approaches, we make the retrieval of protein sequences become efficient and effective. Experiments show that PSSE is slightly more accurate and 3 times faster than the default setting BLAST while searching protein sequences. In comparison with the most sensitive BLAST, PSSE is over 36 times faster by losing no more than 1% accuracy.
author2 Shian-Hua Lin
author_facet Shian-Hua Lin
Guo-Hsing Lee
李果興
author Guo-Hsing Lee
李果興
spellingShingle Guo-Hsing Lee
李果興
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
author_sort Guo-Hsing Lee
title An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
title_short An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
title_full An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
title_fullStr An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
title_full_unstemmed An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
title_sort efficient protein sequence retrieval system based on search engine techniques
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/76995321504438438278
work_keys_str_mv AT guohsinglee anefficientproteinsequenceretrievalsystembasedonsearchenginetechniques
AT lǐguǒxìng anefficientproteinsequenceretrievalsystembasedonsearchenginetechniques
AT guohsinglee yǐsōuxúnyǐnqíngjìshùwèijīchǔzhīgāoxiàonéngdànbáizhìxùlièjiǎnsuǒxìtǒng
AT lǐguǒxìng yǐsōuxúnyǐnqíngjìshùwèijīchǔzhīgāoxiàonéngdànbáizhìxùlièjiǎnsuǒxìtǒng
AT guohsinglee efficientproteinsequenceretrievalsystembasedonsearchenginetechniques
AT lǐguǒxìng efficientproteinsequenceretrievalsystembasedonsearchenginetechniques
_version_ 1718297713148166144