An Efficient Protein Sequence Retrieval System Based on Search Engine Techniques

碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successf...

Full description

Bibliographic Details
Main Authors: Guo-Hsing Lee, 李果興
Other Authors: Shian-Hua Lin
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/76995321504438438278
Description
Summary:碩士 === 國立暨南國際大學 === 生物醫學科技研究所 === 93 === Protein databases are widely used by biologists for homology search. In order to keep up with the growth of the protein databases, fast, accurate and scalable search techniques are emergent. Index-based techniques used in Web search engines have been successfully proved by billions users in perspectives of performance and scalability. Based on these experiences, we apply information retrieval and search engine methods to carry out an index-based homology search system, Protein Sequence Search Engine (PSSE). By proposing novel term-extraction and term-weighting approaches, we make the retrieval of protein sequences become efficient and effective. Experiments show that PSSE is slightly more accurate and 3 times faster than the default setting BLAST while searching protein sequences. In comparison with the most sensitive BLAST, PSSE is over 36 times faster by losing no more than 1% accuracy.