A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.

A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability...

Full description

Bibliographic Details
Main Authors:	Gregory R Smith, Marc R Birtwistle
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2016-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4915702?pdf=render

id	doaj-ec124fe93689402399f073b89d9cda93
record_format	Article
spelling	doaj-ec124fe93689402399f073b89d9cda932020-11-24T21:52:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01116e015782810.1371/journal.pone.0157828A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.Gregory R SmithMarc R BirtwistleA main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes.http://europepmc.org/articles/PMC4915702?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Gregory R Smith Marc R Birtwistle
spellingShingle	Gregory R Smith Marc R Birtwistle A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data. PLoS ONE
author_facet	Gregory R Smith Marc R Birtwistle
author_sort	Gregory R Smith
title	A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.
title_short	A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.
title_full	A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.
title_fullStr	A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.
title_full_unstemmed	A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.
title_sort	mechanistic beta-binomial probability model for mrna sequencing data.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2016-01-01
description	A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes.
url	http://europepmc.org/articles/PMC4915702?pdf=render
work_keys_str_mv	AT gregoryrsmith amechanisticbetabinomialprobabilitymodelformrnasequencingdata AT marcrbirtwistle amechanisticbetabinomialprobabilitymodelformrnasequencingdata AT gregoryrsmith mechanisticbetabinomialprobabilitymodelformrnasequencingdata AT marcrbirtwistle mechanisticbetabinomialprobabilitymodelformrnasequencingdata
_version_	1725877151243173888

A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.

Similar Items