Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches

碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequenci...

Full description

Bibliographic Details
Main Authors: Yu-Shiang Zeng, 曾禹翔
Other Authors: 蔡政安
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/14306600258302972844
id ndltd-TW-103NTU05417009
record_format oai_dc
spelling ndltd-TW-103NTU054170092016-11-19T04:09:46Z http://ndltd.ncl.edu.tw/handle/14306600258302972844 Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches 以貝氏分析方法來偵測轉錄體定序資料之顯著基因 Yu-Shiang Zeng 曾禹翔 碩士 國立臺灣大學 農藝學研究所 103 With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts, the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur. To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data. Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting differentially expressed genes. 蔡政安 2015 學位論文 ; thesis 84 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts, the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur. To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data. Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting differentially expressed genes.
author2 蔡政安
author_facet 蔡政安
Yu-Shiang Zeng
曾禹翔
author Yu-Shiang Zeng
曾禹翔
spellingShingle Yu-Shiang Zeng
曾禹翔
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
author_sort Yu-Shiang Zeng
title Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
title_short Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
title_full Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
title_fullStr Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
title_full_unstemmed Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
title_sort identification of differentially expressed genes ofrna-seq data based on bayesian approaches
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/14306600258302972844
work_keys_str_mv AT yushiangzeng identificationofdifferentiallyexpressedgenesofrnaseqdatabasedonbayesianapproaches
AT céngyǔxiáng identificationofdifferentiallyexpressedgenesofrnaseqdatabasedonbayesianapproaches
AT yushiangzeng yǐbèishìfēnxīfāngfǎláizhēncèzhuǎnlùtǐdìngxùzīliàozhīxiǎnzhejīyīn
AT céngyǔxiáng yǐbèishìfēnxīfāngfǎláizhēncèzhuǎnlùtǐdìngxùzīliàozhīxiǎnzhejīyīn
_version_ 1718394386257018880