Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes

碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells...

Full description

Bibliographic Details
Main Authors:	Fang-YuWu, 吳方渝
Other Authors:	Mi-Chia Ma
Format:	Others
Language:	zh-TW
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/wjzg77

id	ndltd-TW-105NCKU5337019
record_format	oai_dc
spelling	ndltd-TW-105NCKU53370192019-05-15T23:47:01Z http://ndltd.ncl.edu.tw/handle/wjzg77 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes 利用核密度和廣義估計方程式估計致病基因的個數 Fang-YuWu 吳方渝碩士國立成功大學統計學系 105 In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses. This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method. Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively. Mi-Chia Ma 馬瀰嘉 2017 學位論文 ; thesis 38 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses. This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method. Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively.
author2	Mi-Chia Ma
author_facet	Mi-Chia Ma Fang-YuWu 吳方渝
author	Fang-YuWu 吳方渝
spellingShingle	Fang-YuWu 吳方渝 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
author_sort	Fang-YuWu
title	Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_short	Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_full	Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_fullStr	Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_full_unstemmed	Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_sort	using kernel density estimation and generalized estimating equation to estimate the number of diseased genes
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/wjzg77
work_keys_str_mv	AT fangyuwu usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes AT wúfāngyú usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes AT fangyuwu lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù AT wúfāngyú lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù
_version_	1719154661558059008

Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes

Similar Items