Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification

碩士 === 國立臺灣大學 === 生醫電子與資訊學研究所 === 105 === Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the r...

Full description

Bibliographic Details
Main Authors: Ping-Han Hsieh, 謝秉翰
Other Authors: Yen-Jen Oyang
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/5m3zmn
id ndltd-TW-105NTU05114023
record_format oai_dc
spelling ndltd-TW-105NTU051140232019-05-15T23:39:38Z http://ndltd.ncl.edu.tw/handle/5m3zmn Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification 探討轉錄體序列組裝對序列回貼以及基因表現定量的影響 Ping-Han Hsieh 謝秉翰 碩士 國立臺灣大學 生醫電子與資訊學研究所 105 Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For these practical projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of fragmented contigs and redundant sequences produced by the assemblers may result in unreliable abundance estimation. In this regard, this study first investigates how assembly quality might affect the quality of read mapping and count estimation, and then proposes a classifier to characterize the assembled sequences. By the experiments and analyses conducted in this study, several important factors that might seriously affect the accuracy of the RNA-Seq analysis were comprehensively discussed. First, the effects of twelve distinctive assembly groups along with the intrinsic similarity presented in the reference transcriptome on quantification quality were examined. The results showed that the similar subsequences presented in the reference transcriptome only slightly influence mapping quality, but lead to many poorly-assembled contigs. The contigs that merge multiple transcripts into one most heavily decreased the reliability of abundance estimation. Second, a predicting algorithm was proposed to help researchers estimate the quantification reliability for further analyses. In summary, the analytic results conducted in this study provides valuable insights for future studies related to RNA-Seq data analysis. Yen-Jen Oyang Chien-Yu Chen 歐陽彥正 陳倩瑜 2017 學位論文 ; thesis 67 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 生醫電子與資訊學研究所 === 105 === Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For these practical projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of fragmented contigs and redundant sequences produced by the assemblers may result in unreliable abundance estimation. In this regard, this study first investigates how assembly quality might affect the quality of read mapping and count estimation, and then proposes a classifier to characterize the assembled sequences. By the experiments and analyses conducted in this study, several important factors that might seriously affect the accuracy of the RNA-Seq analysis were comprehensively discussed. First, the effects of twelve distinctive assembly groups along with the intrinsic similarity presented in the reference transcriptome on quantification quality were examined. The results showed that the similar subsequences presented in the reference transcriptome only slightly influence mapping quality, but lead to many poorly-assembled contigs. The contigs that merge multiple transcripts into one most heavily decreased the reliability of abundance estimation. Second, a predicting algorithm was proposed to help researchers estimate the quantification reliability for further analyses. In summary, the analytic results conducted in this study provides valuable insights for future studies related to RNA-Seq data analysis.
author2 Yen-Jen Oyang
author_facet Yen-Jen Oyang
Ping-Han Hsieh
謝秉翰
author Ping-Han Hsieh
謝秉翰
spellingShingle Ping-Han Hsieh
謝秉翰
Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
author_sort Ping-Han Hsieh
title Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
title_short Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
title_full Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
title_fullStr Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
title_full_unstemmed Effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
title_sort effect of de novo transcriptome assembly on quality of read mapping and transcript quantification
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/5m3zmn
work_keys_str_mv AT pinghanhsieh effectofdenovotranscriptomeassemblyonqualityofreadmappingandtranscriptquantification
AT xièbǐnghàn effectofdenovotranscriptomeassemblyonqualityofreadmappingandtranscriptquantification
AT pinghanhsieh tàntǎozhuǎnlùtǐxùlièzǔzhuāngduìxùlièhuítiēyǐjíjīyīnbiǎoxiàndìngliàngdeyǐngxiǎng
AT xièbǐnghàn tàntǎozhuǎnlùtǐxùlièzǔzhuāngduìxùlièhuítiēyǐjíjīyīnbiǎoxiàndìngliàngdeyǐngxiǎng
_version_ 1719151497555476480