An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques
碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successf...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/76995321504438438278 |
id |
ndltd-TW-093NCNU0114002 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093NCNU01140022016-06-08T04:13:34Z http://ndltd.ncl.edu.tw/handle/76995321504438438278 An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques 以搜尋引擎技術為基礎之高效能蛋白質序列檢索系統 Guo-Hsing Lee 李果興 碩士 國立暨南國際大學 生物醫學科技研究所 93 Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successfully proved by billions users in perspectives of performance and scalability. Based on these experiences, we apply information retrieval and search engine methods to carry out an index-based homology search system, Protein Sequence Search Engine (PSSE). By proposing novel term-extraction and term-weighting approaches, we make the retrieval of protein sequences become efficient and effective. Experiments show that PSSE is slightly more accurate and 3 times faster than the default setting BLAST while searching protein sequences. In comparison with the most sensitive BLAST, PSSE is over 36 times faster by losing no more than 1% accuracy. Shian-Hua Lin 林宣華 2005 學位論文 ; thesis 61 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successfully proved by billions users in perspectives of performance and scalability. Based on these experiences, we apply information retrieval and search engine methods to carry out an index-based homology search system, Protein Sequence Search Engine (PSSE). By proposing novel term-extraction and term-weighting approaches, we make the retrieval of protein sequences become efficient and effective. Experiments show that PSSE is slightly more accurate and 3 times faster than the default setting BLAST while searching protein sequences. In comparison with the most sensitive BLAST, PSSE is over 36 times faster by losing no more than 1% accuracy.
|
author2 |
Shian-Hua Lin |
author_facet |
Shian-Hua Lin Guo-Hsing Lee 李果興 |
author |
Guo-Hsing Lee 李果興 |
spellingShingle |
Guo-Hsing Lee 李果興 An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
author_sort |
Guo-Hsing Lee |
title |
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
title_short |
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
title_full |
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
title_fullStr |
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
title_full_unstemmed |
An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques |
title_sort |
efficient protein sequence retrieval system based on search engine techniques |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/76995321504438438278 |
work_keys_str_mv |
AT guohsinglee anefficientproteinsequenceretrievalsystembasedonsearchenginetechniques AT lǐguǒxìng anefficientproteinsequenceretrievalsystembasedonsearchenginetechniques AT guohsinglee yǐsōuxúnyǐnqíngjìshùwèijīchǔzhīgāoxiàonéngdànbáizhìxùlièjiǎnsuǒxìtǒng AT lǐguǒxìng yǐsōuxúnyǐnqíngjìshùwèijīchǔzhīgāoxiàonéngdànbáizhìxùlièjiǎnsuǒxìtǒng AT guohsinglee efficientproteinsequenceretrievalsystembasedonsearchenginetechniques AT lǐguǒxìng efficientproteinsequenceretrievalsystembasedonsearchenginetechniques |
_version_ |
1718297713148166144 |