Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.

Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to...

Full description

Bibliographic Details
Main Authors: Farnoosh Abbas-Aghababazadeh, Qian Li, Brooke L Fridley
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC6209231?pdf=render
id doaj-6d4216bff29748c7ac1fbe5a98819eb8
record_format Article
spelling doaj-6d4216bff29748c7ac1fbe5a98819eb82020-11-25T01:52:52ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011310e020631210.1371/journal.pone.0206312Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.Farnoosh Abbas-AghababazadehQian LiBrooke L FridleyNormalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA ("BE") method outperforms the other methods (SVA "Leek", PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.http://europepmc.org/articles/PMC6209231?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Farnoosh Abbas-Aghababazadeh
Qian Li
Brooke L Fridley
spellingShingle Farnoosh Abbas-Aghababazadeh
Qian Li
Brooke L Fridley
Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
PLoS ONE
author_facet Farnoosh Abbas-Aghababazadeh
Qian Li
Brooke L Fridley
author_sort Farnoosh Abbas-Aghababazadeh
title Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
title_short Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
title_full Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
title_fullStr Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
title_full_unstemmed Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
title_sort comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2018-01-01
description Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA ("BE") method outperforms the other methods (SVA "Leek", PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.
url http://europepmc.org/articles/PMC6209231?pdf=render
work_keys_str_mv AT farnooshabbasaghababazadeh comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing
AT qianli comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing
AT brookelfridley comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing
_version_ 1724992408880414720