A Zipf-plot based normalization method for high-throughput RNA-seq data.

Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have s...

Full description

Bibliographic Details
Main Author: Bin Wang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0230594
id doaj-1bcd9cd4a6fd409c818a3acc81f30fdd
record_format Article
spelling doaj-1bcd9cd4a6fd409c818a3acc81f30fdd2021-03-03T21:43:28ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e023059410.1371/journal.pone.0230594A Zipf-plot based normalization method for high-throughput RNA-seq data.Bin WangNormalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn't require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.https://doi.org/10.1371/journal.pone.0230594
collection DOAJ
language English
format Article
sources DOAJ
author Bin Wang
spellingShingle Bin Wang
A Zipf-plot based normalization method for high-throughput RNA-seq data.
PLoS ONE
author_facet Bin Wang
author_sort Bin Wang
title A Zipf-plot based normalization method for high-throughput RNA-seq data.
title_short A Zipf-plot based normalization method for high-throughput RNA-seq data.
title_full A Zipf-plot based normalization method for high-throughput RNA-seq data.
title_fullStr A Zipf-plot based normalization method for high-throughput RNA-seq data.
title_full_unstemmed A Zipf-plot based normalization method for high-throughput RNA-seq data.
title_sort zipf-plot based normalization method for high-throughput rna-seq data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn't require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.
url https://doi.org/10.1371/journal.pone.0230594
work_keys_str_mv AT binwang azipfplotbasednormalizationmethodforhighthroughputrnaseqdata
AT binwang zipfplotbasednormalizationmethodforhighthroughputrnaseqdata
_version_ 1714815428584275968