Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches

碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequenci...

Full description

Bibliographic Details
Main Authors: Yu-Shiang Zeng, 曾禹翔
Other Authors: 蔡政安
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/14306600258302972844
Description
Summary:碩士 === 國立臺灣大學 === 農藝學研究所 === 103 === With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts, the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur. To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data. Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting differentially expressed genes.