CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates
Abstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-017-1974-4 |
id |
doaj-28434076c38346779f7f2390a45c08f2 |
---|---|
record_format |
Article |
spelling |
doaj-28434076c38346779f7f2390a45c08f22020-11-24T21:59:46ZengBMCBMC Bioinformatics1471-21052017-12-0118S1624325310.1186/s12859-017-1974-4CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicatesJoel Z. B. Low0Tsung Fei Khang1Martti T. Tammi2Institute of Biological Sciences, Faculty of Science, University of MalayaInstitute of Mathematical Sciences, Faculty of Science, University of MalayaInstitute of Biological Sciences, Faculty of Science, University of MalayaAbstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. Results We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Conclusions Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .http://link.springer.com/article/10.1186/s12859-017-1974-4RNA-SeqUnreplicated experimentsBayesian statisticsDifferential gene expressionSequencing coverageIllumina |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Joel Z. B. Low Tsung Fei Khang Martti T. Tammi |
spellingShingle |
Joel Z. B. Low Tsung Fei Khang Martti T. Tammi CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates BMC Bioinformatics RNA-Seq Unreplicated experiments Bayesian statistics Differential gene expression Sequencing coverage Illumina |
author_facet |
Joel Z. B. Low Tsung Fei Khang Martti T. Tammi |
author_sort |
Joel Z. B. Low |
title |
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates |
title_short |
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates |
title_full |
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates |
title_fullStr |
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates |
title_full_unstemmed |
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates |
title_sort |
cornas: coverage-dependent rna-seq analysis of gene expression data without biological replicates |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2017-12-01 |
description |
Abstract Background In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. Results We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Conclusions Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS . |
topic |
RNA-Seq Unreplicated experiments Bayesian statistics Differential gene expression Sequencing coverage Illumina |
url |
http://link.springer.com/article/10.1186/s12859-017-1974-4 |
work_keys_str_mv |
AT joelzblow cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates AT tsungfeikhang cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates AT marttittammi cornascoveragedependentrnaseqanalysisofgeneexpressiondatawithoutbiologicalreplicates |
_version_ |
1725847252889501696 |