Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequenci...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/14306600258302972844 |
id |
ndltd-TW-103NTU05417009 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTU054170092016-11-19T04:09:46Z http://ndltd.ncl.edu.tw/handle/14306600258302972844 Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches 以貝氏分析方法來偵測轉錄體定序資料之顯著基因 Yu-Shiang Zeng 曾禹翔 碩士 國立臺灣大學 農藝學研究所 103 With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts, the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur. To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data. Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting differentially expressed genes. 蔡政安 2015 學位論文 ; thesis 84 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes
whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific
gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts,
the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur.
To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS
only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data.
Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR
and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting
differentially expressed genes.
|
author2 |
蔡政安 |
author_facet |
蔡政安 Yu-Shiang Zeng 曾禹翔 |
author |
Yu-Shiang Zeng 曾禹翔 |
spellingShingle |
Yu-Shiang Zeng 曾禹翔 Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
author_sort |
Yu-Shiang Zeng |
title |
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
title_short |
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
title_full |
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
title_fullStr |
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
title_full_unstemmed |
Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches |
title_sort |
identification of differentially expressed genes ofrna-seq data based on bayesian approaches |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/14306600258302972844 |
work_keys_str_mv |
AT yushiangzeng identificationofdifferentiallyexpressedgenesofrnaseqdatabasedonbayesianapproaches AT céngyǔxiáng identificationofdifferentiallyexpressedgenesofrnaseqdatabasedonbayesianapproaches AT yushiangzeng yǐbèishìfēnxīfāngfǎláizhēncèzhuǎnlùtǐdìngxùzīliàozhīxiǎnzhejīyīn AT céngyǔxiáng yǐbèishìfēnxīfāngfǎláizhēncèzhuǎnlùtǐdìngxùzīliàozhīxiǎnzhejīyīn |
_version_ |
1718394386257018880 |