Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
碩士 === 國立臺灣大學 === 生醫電子與資訊學研究所 === 105 === Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the r...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/5m3zmn |
id |
ndltd-TW-105NTU05114023 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NTU051140232019-05-15T23:39:38Z http://ndltd.ncl.edu.tw/handle/5m3zmn Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification 探討轉錄體序列組裝對序列回貼以及基因表現定量的影響 Ping-Han Hsieh 謝秉翰 碩士 國立臺灣大學 生醫電子與資訊學研究所 105 Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For these practical projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of fragmented contigs and redundant sequences produced by the assemblers may result in unreliable abundance estimation. In this regard, this study first investigates how assembly quality might affect the quality of read mapping and count estimation, and then proposes a classifier to characterize the assembled sequences. By the experiments and analyses conducted in this study, several important factors that might seriously affect the accuracy of the RNA-Seq analysis were comprehensively discussed. First, the effects of twelve distinctive assembly groups along with the intrinsic similarity presented in the reference transcriptome on quantification quality were examined. The results showed that the similar subsequences presented in the reference transcriptome only slightly influence mapping quality, but lead to many poorly-assembled contigs. The contigs that merge multiple transcripts into one most heavily decreased the reliability of abundance estimation. Second, a predicting algorithm was proposed to help researchers estimate the quantification reliability for further analyses. In summary, the analytic results conducted in this study provides valuable insights for future studies related to RNA-Seq data analysis. Yen-Jen Oyang Chien-Yu Chen 歐陽彥正 陳倩瑜 2017 學位論文 ; thesis 67 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 生醫電子與資訊學研究所 === 105 === Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For these practical projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of fragmented contigs and redundant sequences produced by the assemblers may result in unreliable abundance estimation. In this regard, this study first investigates how assembly quality might affect the quality of read mapping and count estimation, and then proposes a classifier to characterize the assembled sequences. By the experiments and analyses conducted in this study, several important factors that might seriously affect the accuracy of the RNA-Seq analysis were comprehensively discussed. First, the effects of twelve distinctive assembly groups along with the intrinsic similarity presented in the reference transcriptome on quantification quality were examined. The results showed that the similar subsequences presented in the reference transcriptome only slightly influence mapping quality, but lead to many poorly-assembled contigs. The contigs that merge multiple transcripts into one most heavily decreased the reliability of abundance estimation. Second, a predicting algorithm was proposed to help researchers estimate the quantification reliability for further analyses. In summary, the analytic results conducted in this study provides valuable insights for future studies related to RNA-Seq data analysis.
|
author2 |
Yen-Jen Oyang |
author_facet |
Yen-Jen Oyang Ping-Han Hsieh 謝秉翰 |
author |
Ping-Han Hsieh 謝秉翰 |
spellingShingle |
Ping-Han Hsieh 謝秉翰 Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
author_sort |
Ping-Han Hsieh |
title |
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
title_short |
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
title_full |
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
title_fullStr |
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
title_full_unstemmed |
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
title_sort |
effect of de novo transcriptome assembly on quality of read mapping and transcript quantification |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/5m3zmn |
work_keys_str_mv |
AT pinghanhsieh effectofdenovotranscriptomeassemblyonqualityofreadmappingandtranscriptquantification AT xièbǐnghàn effectofdenovotranscriptomeassemblyonqualityofreadmappingandtranscriptquantification AT pinghanhsieh tàntǎozhuǎnlùtǐxùlièzǔzhuāngduìxùlièhuítiēyǐjíjīyīnbiǎoxiàndìngliàngdeyǐngxiǎng AT xièbǐnghàn tàntǎozhuǎnlùtǐxùlièzǔzhuāngduìxùlièhuítiēyǐjíjīyīnbiǎoxiàndìngliàngdeyǐngxiǎng |
_version_ |
1719151497555476480 |