ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES

Motivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnos...

Full description

Bibliographic Details
Main Authors: Xinyan Zhang, Himel Mallick, Nengjun Yi
Format: Article
Language:English
Published: Cifra Publishing House 2016-12-01
Series:Journal of Bioinformatics and Genomics
Online Access:http://journal-biogen.org/article/view/12
id doaj-a0a01e1c056147d3922c975b0631dad2
record_format Article
spelling doaj-a0a01e1c056147d3922c975b0631dad22020-11-25T00:21:41ZengCifra Publishing HouseJournal of Bioinformatics and Genomics2530-13812016-12-012 (2)10.18454/jbg.2016.2.2.112ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIESXinyan Zhang0Himel MallickNengjun YiUniversity of Alabama at BirminghamMotivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnostics or therapeutic targets based on the microbiome. 16S ribosomal RNA (rRNA) gene targeted amplicon sequencing is a commonly used approach to determine the taxonomic composition of the bacterial community. Operational taxonomic units (OTUs) are clustered based on generated sequence reads and used to determine whether and how the abundance of microbiome is correlated with some characteristics of the samples, such as health/disease status, smoking status, or dietary habit. However, OTU count data is not only overdispersed but also contains an excess number of zero counts due to undersampling. Efficient analytical tools are therefore needed for downstream statistical analysis which can simultaneously account for overdispersion and sparsity in microbiome data. Results: In this paper, we propose a Zero-inflated Negative Binomial (ZINB) regression for identifying differentially abundant taxa between two or more populations. The proposed method utilizes an Expectation Maximization (EM) algorithm, by incorporating a two-part mixture model consisting of (i) a negative binomial model to account for over-dispersion and (ii) a logistic regression model to account for excessive zero counts. Extensive simulation studies are conducted which indicate that ZINB demonstrates better performance as compared to several state-of-the-art approaches, as measured by the area under the curve (AUC). Application to two real datasets indicate that the proposed method is capable of detecting biologically meaningful taxa, consistent with previous studies. Availability: The software implementation of ZINB is available at: http://www.ssg.uab.edu/bhglm/. Supplementary information: Supplementary data are available at Journal of Bioinformatics and Genomics online.http://journal-biogen.org/article/view/12
collection DOAJ
language English
format Article
sources DOAJ
author Xinyan Zhang
Himel Mallick
Nengjun Yi
spellingShingle Xinyan Zhang
Himel Mallick
Nengjun Yi
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
Journal of Bioinformatics and Genomics
author_facet Xinyan Zhang
Himel Mallick
Nengjun Yi
author_sort Xinyan Zhang
title ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
title_short ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
title_full ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
title_fullStr ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
title_full_unstemmed ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
title_sort zero-inflated negative binomial regression for differential abundance testing in microbiome studies
publisher Cifra Publishing House
series Journal of Bioinformatics and Genomics
issn 2530-1381
publishDate 2016-12-01
description Motivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnostics or therapeutic targets based on the microbiome. 16S ribosomal RNA (rRNA) gene targeted amplicon sequencing is a commonly used approach to determine the taxonomic composition of the bacterial community. Operational taxonomic units (OTUs) are clustered based on generated sequence reads and used to determine whether and how the abundance of microbiome is correlated with some characteristics of the samples, such as health/disease status, smoking status, or dietary habit. However, OTU count data is not only overdispersed but also contains an excess number of zero counts due to undersampling. Efficient analytical tools are therefore needed for downstream statistical analysis which can simultaneously account for overdispersion and sparsity in microbiome data. Results: In this paper, we propose a Zero-inflated Negative Binomial (ZINB) regression for identifying differentially abundant taxa between two or more populations. The proposed method utilizes an Expectation Maximization (EM) algorithm, by incorporating a two-part mixture model consisting of (i) a negative binomial model to account for over-dispersion and (ii) a logistic regression model to account for excessive zero counts. Extensive simulation studies are conducted which indicate that ZINB demonstrates better performance as compared to several state-of-the-art approaches, as measured by the area under the curve (AUC). Application to two real datasets indicate that the proposed method is capable of detecting biologically meaningful taxa, consistent with previous studies. Availability: The software implementation of ZINB is available at: http://www.ssg.uab.edu/bhglm/. Supplementary information: Supplementary data are available at Journal of Bioinformatics and Genomics online.
url http://journal-biogen.org/article/view/12
work_keys_str_mv AT xinyanzhang zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies
AT himelmallick zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies
AT nengjunyi zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies
_version_ 1725361378259107840