Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/wjzg77 |
id |
ndltd-TW-105NCKU5337019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCKU53370192019-05-15T23:47:01Z http://ndltd.ncl.edu.tw/handle/wjzg77 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes 利用核密度和廣義估計方程式估計致病基因的個數 Fang-YuWu 吳方渝 碩士 國立成功大學 統計學系 105 In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses. This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method. Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively. Mi-Chia Ma 馬瀰嘉 2017 學位論文 ; thesis 38 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 統計學系 === 105 === In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses.
This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel
density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method.
Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively.
|
author2 |
Mi-Chia Ma |
author_facet |
Mi-Chia Ma Fang-YuWu 吳方渝 |
author |
Fang-YuWu 吳方渝 |
spellingShingle |
Fang-YuWu 吳方渝 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
author_sort |
Fang-YuWu |
title |
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
title_short |
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
title_full |
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
title_fullStr |
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
title_full_unstemmed |
Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
title_sort |
using kernel density estimation and generalized estimating equation to estimate the number of diseased genes |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/wjzg77 |
work_keys_str_mv |
AT fangyuwu usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes AT wúfāngyú usingkerneldensityestimationandgeneralizedestimatingequationtoestimatethenumberofdiseasedgenes AT fangyuwu lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù AT wúfāngyú lìyònghémìdùhéguǎngyìgūjìfāngchéngshìgūjìzhìbìngjīyīndegèshù |
_version_ |
1719154661558059008 |