Multiple imputation and direct estimation for qPCR data with non-detects

Abstract Background Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a mea...

Full description

Bibliographic Details
Main Authors: Valeriia Sherina, Helene R. McMurray, Winslow Powers, Harmut Land, Tanzy M. T. Love, Matthew N. McCall
Format: Article
Language:English
Published: BMC 2020-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03807-9
id doaj-812c9b28a3c0467288cb8d83d52848d7
record_format Article
spelling doaj-812c9b28a3c0467288cb8d83d52848d72020-11-26T12:52:59ZengBMCBMC Bioinformatics1471-21052020-11-0121111510.1186/s12859-020-03807-9Multiple imputation and direct estimation for qPCR data with non-detectsValeriia Sherina0Helene R. McMurray1Winslow Powers2Harmut Land3Tanzy M. T. Love4Matthew N. McCall5Department of Biostatistics and Computational Biology, University of Rochester Medical CenterDepartment of Biomedical Genetics, University of Rochester Medical CenterDepartment of Biomedical Engineering, University of RochesterDepartment of Biomedical Genetics, University of Rochester Medical CenterDepartment of Biostatistics and Computational Biology, University of Rochester Medical CenterDepartment of Biostatistics and Computational Biology, University of Rochester Medical CenterAbstract Background Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. Results We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects. Conclusions The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.http://link.springer.com/article/10.1186/s12859-020-03807-9Gene expressionQuantitative real-time PCR (qPCR)Missing not at random (MNAR)Non-detectsDirect estimationMultiple imputation
collection DOAJ
language English
format Article
sources DOAJ
author Valeriia Sherina
Helene R. McMurray
Winslow Powers
Harmut Land
Tanzy M. T. Love
Matthew N. McCall
spellingShingle Valeriia Sherina
Helene R. McMurray
Winslow Powers
Harmut Land
Tanzy M. T. Love
Matthew N. McCall
Multiple imputation and direct estimation for qPCR data with non-detects
BMC Bioinformatics
Gene expression
Quantitative real-time PCR (qPCR)
Missing not at random (MNAR)
Non-detects
Direct estimation
Multiple imputation
author_facet Valeriia Sherina
Helene R. McMurray
Winslow Powers
Harmut Land
Tanzy M. T. Love
Matthew N. McCall
author_sort Valeriia Sherina
title Multiple imputation and direct estimation for qPCR data with non-detects
title_short Multiple imputation and direct estimation for qPCR data with non-detects
title_full Multiple imputation and direct estimation for qPCR data with non-detects
title_fullStr Multiple imputation and direct estimation for qPCR data with non-detects
title_full_unstemmed Multiple imputation and direct estimation for qPCR data with non-detects
title_sort multiple imputation and direct estimation for qpcr data with non-detects
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-11-01
description Abstract Background Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. Results We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects. Conclusions The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.
topic Gene expression
Quantitative real-time PCR (qPCR)
Missing not at random (MNAR)
Non-detects
Direct estimation
Multiple imputation
url http://link.springer.com/article/10.1186/s12859-020-03807-9
work_keys_str_mv AT valeriiasherina multipleimputationanddirectestimationforqpcrdatawithnondetects
AT helenermcmurray multipleimputationanddirectestimationforqpcrdatawithnondetects
AT winslowpowers multipleimputationanddirectestimationforqpcrdatawithnondetects
AT harmutland multipleimputationanddirectestimationforqpcrdatawithnondetects
AT tanzymtlove multipleimputationanddirectestimationforqpcrdatawithnondetects
AT matthewnmccall multipleimputationanddirectestimationforqpcrdatawithnondetects
_version_ 1724414577281597440