Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression

Bibliographic Details
Main Author: McLaughlin, Eric M.
Language:English
Published: The Ohio State University / OhioLINK 2017
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=osu1492731115824706
id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1492731115824706
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu14927311158247062021-08-03T07:01:57Z Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression McLaughlin, Eric M. Biostatistics Genetics One of the difficulties in analyzing RNA sequencing data for tests of differential expression is the estimation of a gene-specific dispersion parameter. Most methods assume that gene read counts follow negative binomial distributions, with the dispersion accounting for the additional variance relative to a Poisson distribution. Prior studies have shown that when the dispersion is underestimated, the possibility of falsely labeling equivalently expressed genes as differentially expressed is increased. Overestimating the dispersion, however, tends to result in differentially expressed gene incorrectly being determined to be equivalently expressed. This simulation study looks at how the false discovery rate of differentially expressed genes changes based upon under or overestimation of the dispersion relative to dispersions calculated from a real dataset.Methods that restricted the dispersion estimates to between 0 and 1 had the closest linear regression slope coefficients to 1, which indicates that these methods perform the best in having the estimated dispersion match the dispersion used to simulated the gene counts. However, most other methods had many points fall around a 45º trendline, meaning that there was a fair amount of agreement between estimated and “true” dispersions for many genes, but some were clustered where they were heavily over or underestimated, drawing the slope and trendline either above or below 1.Ratios of estimated to “true” dispersion were used to group genes into deciles, and the false discovery rate of differentially expressed genes within each decile was observed for dispersion under and overestimation. Aside from the quasi-likelihood estimation method, which behaved inconsistently, the general trend in false discovery was that it was most common when dispersion was under or overestimated by a factor around 100, and least common when dispersions differed by a factor less than 100. Across deciles where the false discovery rate was low, the change was either a negligible difference or a slight decrease in the rate moving from under to overestimated, with a jump occurring once heavy overestimation begins. Overall, any of the examined tagwise or trended dispersion estimation methods produced trends in the false discovery rate where it was highest with extreme under or overestimation. It may be useful to find a characteristic in the RNA-seq data that is shared in those genes most at risk for heavy under or overestimation of dispersion, so that this apparent high rate of false discovery can be accounted for when interpreting results. 2017-08-11 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1492731115824706 http://rave.ohiolink.edu/etdc/view?acc_num=osu1492731115824706 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Biostatistics
Genetics
spellingShingle Biostatistics
Genetics
McLaughlin, Eric M.
Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
author McLaughlin, Eric M.
author_facet McLaughlin, Eric M.
author_sort McLaughlin, Eric M.
title Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
title_short Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
title_full Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
title_fullStr Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
title_full_unstemmed Analysis of the Impact of Over or Underestimating the Dispersion Parameter on the Results of Tests for Differential Gene Expression
title_sort analysis of the impact of over or underestimating the dispersion parameter on the results of tests for differential gene expression
publisher The Ohio State University / OhioLINK
publishDate 2017
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1492731115824706
work_keys_str_mv AT mclaughlinericm analysisoftheimpactofoverorunderestimatingthedispersionparameterontheresultsoftestsfordifferentialgeneexpression
_version_ 1719452234847092736