Scalable Index Techniques for Biological String Databases
碩士 === 國立清華大學 === 資訊工程學系 === 94 === With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientist...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/26897867296726728308 |
id |
ndltd-TW-094NTHU5392022 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NTHU53920222016-06-01T04:14:41Z http://ndltd.ncl.edu.tw/handle/26897867296726728308 Scalable Index Techniques for Biological String Databases 生物字串資料庫上高效能索引技術之研究 Chien-Wen Lin 林健紋 碩士 國立清華大學 資訊工程學系 94 With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientists. In turn, the predictive results from computing facilitate confining the scope of the subsequent verification. As the amount of biological strings grows at an increasing speed, the efficiency of computing model is vital to this knowledge discovery process. Human Genome project and its ongoing work places a high demand on computing. This thesis aims at this issue. Our research focuses on improving the search efficiency of semi-global alignment problem. In particular, we propose a scalable solution to leverage the state-of-the-art index designs to date. Despite the wide acceptance in prediction accuracy, Smith-Waterman (SW) –based alignment algorithms are too expensive to scale. “An Online and Accurate Technique for Local-alignment Searches on Biological Sequences”(OASIS) scheme embeds SW to better search scalability using a suffix tree index. However, its space overhead is considerable. To address these drawbacks, we develop a space-efficient, easy-to-update index design to rapidly identify the potential matches to a query string before subsequent costly SW verification. As a result, most of futile computations can be avoided in our approach using only about 10% index space overhead. Such preliminary filtration framework is generic in nature. It serves as a safe lower bound to the distance functions for a variety of SW cost matrices. Yet, the usage of suffix trees in its design offers favorable scalability for the problem. Intensive experimental results substantiate these advantages of our technique. Chaur-Chin Chen 陳朝欽 2006 學位論文 ; thesis 32 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊工程學系 === 94 === With the advances of Bioinformatics and computer technologies, the knowledge development in this joint field comes to a new era. More often than not, the laboratory discoveries of biologists substantially help refine the computing models used by computer scientists. In turn, the predictive results from computing facilitate confining the scope of the subsequent verification. As the amount of biological strings grows at an increasing speed, the efficiency of computing model is vital to this knowledge discovery process. Human Genome project and its ongoing work places a high demand on computing. This thesis aims at this issue.
Our research focuses on improving the search efficiency of semi-global alignment problem. In particular, we propose a scalable solution to leverage the state-of-the-art index designs to date. Despite the wide acceptance in prediction accuracy, Smith-Waterman (SW) –based alignment algorithms are too expensive to scale. “An Online and Accurate Technique for Local-alignment Searches on Biological Sequences”(OASIS) scheme embeds SW to better search scalability using a suffix tree index. However, its space overhead is considerable. To address these drawbacks, we develop a space-efficient, easy-to-update index design to rapidly identify the potential matches to a query string before subsequent costly SW verification. As a result, most of futile computations can be avoided in our approach using only about 10% index space overhead. Such preliminary filtration framework is generic in nature. It serves as a safe lower bound to the distance functions for a variety of SW cost matrices. Yet, the usage of suffix trees in its design offers favorable scalability for the problem. Intensive experimental results substantiate these advantages of our technique.
|
author2 |
Chaur-Chin Chen |
author_facet |
Chaur-Chin Chen Chien-Wen Lin 林健紋 |
author |
Chien-Wen Lin 林健紋 |
spellingShingle |
Chien-Wen Lin 林健紋 Scalable Index Techniques for Biological String Databases |
author_sort |
Chien-Wen Lin |
title |
Scalable Index Techniques for Biological String Databases |
title_short |
Scalable Index Techniques for Biological String Databases |
title_full |
Scalable Index Techniques for Biological String Databases |
title_fullStr |
Scalable Index Techniques for Biological String Databases |
title_full_unstemmed |
Scalable Index Techniques for Biological String Databases |
title_sort |
scalable index techniques for biological string databases |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/26897867296726728308 |
work_keys_str_mv |
AT chienwenlin scalableindextechniquesforbiologicalstringdatabases AT línjiànwén scalableindextechniquesforbiologicalstringdatabases AT chienwenlin shēngwùzìchuànzīliàokùshànggāoxiàonéngsuǒyǐnjìshùzhīyánjiū AT línjiànwén shēngwùzìchuànzīliàokùshànggāoxiàonéngsuǒyǐnjìshùzhīyánjiū |
_version_ |
1718287350583263232 |