CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates

Abstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sa...

Full description

Bibliographic Details
Main Authors: Joel Z. B. Low, Tsung Fei Khang, Martti T. Tammi
Format: Article
Language:English
Published: BMC 2017-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1974-4
id doaj-28434076c38346779f7f2390a45c08f2
record_format Article
spelling doaj-28434076c38346779f7f2390a45c08f22020-11-24T21:59:46ZengBMCBMC Bioinformatics1471-21052017-12-0118S1624325310.1186/s12859-017-1974-4CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicatesJoel Z. B. Low0Tsung Fei Khang1Martti T. Tammi2Institute of Biological Sciences, Faculty of Science, University of MalayaInstitute of Mathematical Sciences, Faculty of Science, University of MalayaInstitute of Biological Sciences, Faculty of Science, University of MalayaAbstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. Results We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Conclusions Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .http://link.springer.com/article/10.1186/s12859-017-1974-4RNA-SeqUnreplicated experimentsBayesian statisticsDifferential gene expressionSequencing coverageIllumina
collection DOAJ
language English
format Article
sources DOAJ
author Joel Z. B. Low
Tsung Fei Khang
Martti T. Tammi
spellingShingle Joel Z. B. Low
Tsung Fei Khang
Martti T. Tammi
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
BMC Bioinformatics
RNA-Seq
Unreplicated experiments
Bayesian statistics
Differential gene expression
Sequencing coverage
Illumina
author_facet Joel Z. B. Low
Tsung Fei Khang
Martti T. Tammi
author_sort Joel Z. B. Low
title CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
title_short CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
title_full CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
title_fullStr CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
title_full_unstemmed CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
title_sort cornas: coverage-dependent rna-seq analysis of gene expression data without biological replicates
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-12-01
description Abstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. Results We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Conclusions Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .
topic RNA-Seq
Unreplicated experiments
Bayesian statistics
Differential gene expression
Sequencing coverage
Illumina
url http://link.springer.com/article/10.1186/s12859-017-1974-4
work_keys_str_mv AT joelzblow cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates
AT tsungfeikhang cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates
AT marttittammi cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates
_version_ 1725847252889501696