DNA Sequence Assembly

碩士 === 國立暨南國際大學 === 資訊工程學系 === 92 === In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragme...

Full description

Bibliographic Details
Main Authors:	Jui Peng Lu, 盧瑞鵬
Other Authors:	R. C. T. Lee
Format:	Others
Language:	zh-TW
Published:	2004
Online Access:	http://ndltd.ncl.edu.tw/handle/31298571837144051308

id	ndltd-TW-092NCNU0392030
record_format	oai_dc
spelling	ndltd-TW-092NCNU03920302016-06-17T04:16:59Z http://ndltd.ncl.edu.tw/handle/31298571837144051308 DNA Sequence Assembly DNA序列重組 Jui Peng Lu 盧瑞鵬碩士國立暨南國際大學資訊工程學系 92 In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully. The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable. R. C. T. Lee 李家同 2004 學位論文 ; thesis 59 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立暨南國際大學 === 資訊工程學系 === 92 === In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully. The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable.
author2	R. C. T. Lee
author_facet	R. C. T. Lee Jui Peng Lu 盧瑞鵬
author	Jui Peng Lu 盧瑞鵬
spellingShingle	Jui Peng Lu 盧瑞鵬 DNA Sequence Assembly
author_sort	Jui Peng Lu
title	DNA Sequence Assembly
title_short	DNA Sequence Assembly
title_full	DNA Sequence Assembly
title_fullStr	DNA Sequence Assembly
title_full_unstemmed	DNA Sequence Assembly
title_sort	dna sequence assembly
publishDate	2004
url	http://ndltd.ncl.edu.tw/handle/31298571837144051308
work_keys_str_mv	AT juipenglu dnasequenceassembly AT lúruìpéng dnasequenceassembly AT juipenglu dnaxùlièzhòngzǔ AT lúruìpéng dnaxùlièzhòngzǔ
_version_	1718308921523830784

DNA Sequence Assembly

Similar Items