Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes

碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells...

Full description

Bibliographic Details
Main Authors: Fang-YuWu, 吳方渝
Other Authors: Mi-Chia Ma
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/wjzg77
id ndltd-TW-105NCKU5337019
record_format oai_dc
spelling ndltd-TW-105NCKU53370192019-05-15T23:47:01Z http://ndltd.ncl.edu.tw/handle/wjzg77 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes 利用核密度和廣義估計方程式估計致病基因的個數 Fang-YuWu 吳方渝 碩士 國立成功大學 統計學系 105 In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses. This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method. Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively. Mi-Chia Ma 馬瀰嘉 2017 學位論文 ; thesis 38 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses. This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method. Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively.
author2 Mi-Chia Ma
author_facet Mi-Chia Ma
Fang-YuWu
吳方渝
author Fang-YuWu
吳方渝
spellingShingle Fang-YuWu
吳方渝
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
author_sort Fang-YuWu
title Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_short Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_full Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_fullStr Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_full_unstemmed Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
title_sort using kernel density estimation and generalized estimating equation to estimate the number of diseased genes
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/wjzg77
work_keys_str_mv AT fangyuwu usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes
AT wúfāngyú usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes
AT fangyuwu lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù
AT wúfāngyú lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù
_version_ 1719154661558059008