DNA Sequence Assembly

碩士 === 國立暨南國際大學 === 資訊工程學系 === 92 === In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragme...

Full description

Bibliographic Details
Main Authors: Jui Peng Lu, 盧瑞鵬
Other Authors: R. C. T. Lee
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/31298571837144051308
id ndltd-TW-092NCNU0392030
record_format oai_dc
spelling ndltd-TW-092NCNU03920302016-06-17T04:16:59Z http://ndltd.ncl.edu.tw/handle/31298571837144051308 DNA Sequence Assembly DNA序列重組 Jui Peng Lu 盧瑞鵬 碩士 國立暨南國際大學 資訊工程學系 92 In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully. The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable. R. C. T. Lee 李家同 2004 學位論文 ; thesis 59 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 資訊工程學系 === 92 === In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully. The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable.
author2 R. C. T. Lee
author_facet R. C. T. Lee
Jui Peng Lu
盧瑞鵬
author Jui Peng Lu
盧瑞鵬
spellingShingle Jui Peng Lu
盧瑞鵬
DNA Sequence Assembly
author_sort Jui Peng Lu
title DNA Sequence Assembly
title_short DNA Sequence Assembly
title_full DNA Sequence Assembly
title_fullStr DNA Sequence Assembly
title_full_unstemmed DNA Sequence Assembly
title_sort dna sequence assembly
publishDate 2004
url http://ndltd.ncl.edu.tw/handle/31298571837144051308
work_keys_str_mv AT juipenglu dnasequenceassembly
AT lúruìpéng dnasequenceassembly
AT juipenglu dnaxùlièzhòngzǔ
AT lúruìpéng dnaxùlièzhòngzǔ
_version_ 1718308921523830784