Exploring the Bias Influences of Next-Generation-Sequencing for de novo Genome Assembly

碩士 === 國立成功大學 === 工程科學系碩博士班 === 100 ===  The next generation sequencing technology is a now important approach to decode the genome. Dealing with the millions of short reads had become a significant issue in the field of computing. In the recent years, a series of tools had been developed to assembl...

Full description

Bibliographic Details
Main Authors: Yen-ChunChen, 陳彥群
Other Authors: Chi-Chuan Hwang
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/06246308292030359861
Description
Summary:碩士 === 國立成功大學 === 工程科學系碩博士班 === 100 ===  The next generation sequencing technology is a now important approach to decode the genome. Dealing with the millions of short reads had become a significant issue in the field of computing. In the recent years, a series of tools had been developed to assembly the huge amount of fragments into more continuous sequences. However, the inherent sequencing bias may reduce the performance of assembly. The effects of bias on assembly have not been systematically discussed in the past.  In this study, we simulate reads with specific degree of sequencing bias and error rate profile for S.aureus, E.coli, M.tuberculosis, Arabidopsis thaliana Chr.1 and Oryza sativa Chr.5. We consider various scenario of bias for each assembler including ALLPATHS-LG, ABySS, Edena, SOAPdenovo, SSAKE, Velvet and Velvet-SC and employ an assembly evaluating tool, GAGE, to discuss the assemblies by both N50 length and accuracy.  The biased data sets will lead the fracture and error within assemblies. The regions with low read coverage are either unable to be assembled or produce the sequence contain SNPs, Indels or reconstructions. Although the most assemblers are capable to deal with small degree of bias within bacterial data, the bias result much deeper impact for the more complex plant genome. The reasonable amount of reads plays an important role to mitigate the bias. This study provides a novel landscape of assembly for the relationship between the coverage and sequencing bias.