A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.

A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studi...

Full description

Bibliographic Details
Main Authors: Rodoniki Athanasiadou, Benjamin Neymotin, Nathan Brandt, Wei Wang, Lionel Christiaen, David Gresham, Daniel Tranchina
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-03-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1006794
id doaj-4d5af508dad54e41b0ed01622e5fe81a
record_format Article
spelling doaj-4d5af508dad54e41b0ed01622e5fe81a2021-06-20T04:30:48ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-03-01153e100679410.1371/journal.pcbi.1006794A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.Rodoniki AthanasiadouBenjamin NeymotinNathan BrandtWei WangLionel ChristiaenDavid GreshamDaniel TranchinaA fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study.https://doi.org/10.1371/journal.pcbi.1006794
collection DOAJ
language English
format Article
sources DOAJ
author Rodoniki Athanasiadou
Benjamin Neymotin
Nathan Brandt
Wei Wang
Lionel Christiaen
David Gresham
Daniel Tranchina
spellingShingle Rodoniki Athanasiadou
Benjamin Neymotin
Nathan Brandt
Wei Wang
Lionel Christiaen
David Gresham
Daniel Tranchina
A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
PLoS Computational Biology
author_facet Rodoniki Athanasiadou
Benjamin Neymotin
Nathan Brandt
Wei Wang
Lionel Christiaen
David Gresham
Daniel Tranchina
author_sort Rodoniki Athanasiadou
title A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
title_short A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
title_full A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
title_fullStr A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
title_full_unstemmed A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory.
title_sort complete statistical model for calibration of rna-seq counts using external spike-ins and maximum likelihood theory.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2019-03-01
description A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study.
url https://doi.org/10.1371/journal.pcbi.1006794
work_keys_str_mv AT rodonikiathanasiadou acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT benjaminneymotin acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT nathanbrandt acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT weiwang acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT lionelchristiaen acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT davidgresham acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT danieltranchina acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT rodonikiathanasiadou completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT benjaminneymotin completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT nathanbrandt completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT weiwang completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT lionelchristiaen completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT davidgresham completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT danieltranchina completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
_version_ 1721370907162705920