Scalable Index Techniques for Biological String Databases

碩士 === 國立清華大學 === 資訊工程學系 === 94 === With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientist...

Full description

Bibliographic Details
Main Authors:	Chien-Wen Lin, 林健紋
Other Authors:	Chaur-Chin Chen
Format:	Others
Language:	en_US
Published:	2006
Online Access:	http://ndltd.ncl.edu.tw/handle/26897867296726728308

id	ndltd-TW-094NTHU5392022
record_format	oai_dc
spelling	ndltd-TW-094NTHU53920222016-06-01T04:14:41Z http://ndltd.ncl.edu.tw/handle/26897867296726728308 Scalable Index Techniques for Biological String Databases 生物字串資料庫上高效能索引技術之研究 Chien-Wen Lin 林健紋碩士國立清華大學資訊工程學系 94 With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientists. In turn, the predictive results from computing facilitate confining the scope of the subsequent verification. As the amount of biological strings grows at an increasing speed, the efficiency of computing model is vital to this knowledge discovery process. Human Genome project and its ongoing work places a high demand on computing. This thesis aims at this issue. Our research focuses on improving the search efficiency of semi-global alignment problem. In particular, we propose a scalable solution to leverage the state-of-the-art index designs to date. Despite the wide acceptance in prediction accuracy, Smith-Waterman (SW) –based alignment algorithms are too expensive to scale. “An Online and Accurate Technique for Local-alignment Searches on Biological Sequences”(OASIS) scheme embeds SW to better search scalability using a suffix tree index. However, its space overhead is considerable. To address these drawbacks, we develop a space-efficient, easy-to-update index design to rapidly identify the potential matches to a query string before subsequent costly SW verification. As a result, most of futile computations can be avoided in our approach using only about 10% index space overhead. Such preliminary filtration framework is generic in nature. It serves as a safe lower bound to the distance functions for a variety of SW cost matrices. Yet, the usage of suffix trees in its design offers favorable scalability for the problem. Intensive experimental results substantiate these advantages of our technique. Chaur-Chin Chen 陳朝欽 2006 學位論文 ; thesis 32 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊工程學系 === 94 === With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientists. In turn, the predictive results from computing facilitate confining the scope of the subsequent verification. As the amount of biological strings grows at an increasing speed, the efficiency of computing model is vital to this knowledge discovery process. Human Genome project and its ongoing work places a high demand on computing. This thesis aims at this issue. Our research focuses on improving the search efficiency of semi-global alignment problem. In particular, we propose a scalable solution to leverage the state-of-the-art index designs to date. Despite the wide acceptance in prediction accuracy, Smith-Waterman (SW) –based alignment algorithms are too expensive to scale. “An Online and Accurate Technique for Local-alignment Searches on Biological Sequences”(OASIS) scheme embeds SW to better search scalability using a suffix tree index. However, its space overhead is considerable. To address these drawbacks, we develop a space-efficient, easy-to-update index design to rapidly identify the potential matches to a query string before subsequent costly SW verification. As a result, most of futile computations can be avoided in our approach using only about 10% index space overhead. Such preliminary filtration framework is generic in nature. It serves as a safe lower bound to the distance functions for a variety of SW cost matrices. Yet, the usage of suffix trees in its design offers favorable scalability for the problem. Intensive experimental results substantiate these advantages of our technique.
author2	Chaur-Chin Chen
author_facet	Chaur-Chin Chen Chien-Wen Lin 林健紋
author	Chien-Wen Lin 林健紋
spellingShingle	Chien-Wen Lin 林健紋 Scalable Index Techniques for Biological String Databases
author_sort	Chien-Wen Lin
title	Scalable Index Techniques for Biological String Databases
title_short	Scalable Index Techniques for Biological String Databases
title_full	Scalable Index Techniques for Biological String Databases
title_fullStr	Scalable Index Techniques for Biological String Databases
title_full_unstemmed	Scalable Index Techniques for Biological String Databases
title_sort	scalable index techniques for biological string databases
publishDate	2006
url	http://ndltd.ncl.edu.tw/handle/26897867296726728308
work_keys_str_mv	AT chienwenlin scalableindextechniquesforbiologicalstringdatabases AT línjiànwén scalableindextechniquesforbiologicalstringdatabases AT chienwenlin shēngwùzìchuànzīliàokùshànggāoxiàonéngsuǒyǐnjìshùzhīyánjiū AT línjiànwén shēngwùzìchuànzīliàokùshànggāoxiàonéngsuǒyǐnjìshùzhīyánjiū
_version_	1718287350583263232

Scalable Index Techniques for Biological String Databases

Similar Items