ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES
Motivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnos...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Cifra Publishing House
2016-12-01
|
Series: | Journal of Bioinformatics and Genomics |
Online Access: | http://journal-biogen.org/article/view/12 |
id |
doaj-a0a01e1c056147d3922c975b0631dad2 |
---|---|
record_format |
Article |
spelling |
doaj-a0a01e1c056147d3922c975b0631dad22020-11-25T00:21:41ZengCifra Publishing HouseJournal of Bioinformatics and Genomics2530-13812016-12-012 (2)10.18454/jbg.2016.2.2.112ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIESXinyan Zhang0Himel MallickNengjun YiUniversity of Alabama at BirminghamMotivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnostics or therapeutic targets based on the microbiome. 16S ribosomal RNA (rRNA) gene targeted amplicon sequencing is a commonly used approach to determine the taxonomic composition of the bacterial community. Operational taxonomic units (OTUs) are clustered based on generated sequence reads and used to determine whether and how the abundance of microbiome is correlated with some characteristics of the samples, such as health/disease status, smoking status, or dietary habit. However, OTU count data is not only overdispersed but also contains an excess number of zero counts due to undersampling. Efficient analytical tools are therefore needed for downstream statistical analysis which can simultaneously account for overdispersion and sparsity in microbiome data. Results: In this paper, we propose a Zero-inflated Negative Binomial (ZINB) regression for identifying differentially abundant taxa between two or more populations. The proposed method utilizes an Expectation Maximization (EM) algorithm, by incorporating a two-part mixture model consisting of (i) a negative binomial model to account for over-dispersion and (ii) a logistic regression model to account for excessive zero counts. Extensive simulation studies are conducted which indicate that ZINB demonstrates better performance as compared to several state-of-the-art approaches, as measured by the area under the curve (AUC). Application to two real datasets indicate that the proposed method is capable of detecting biologically meaningful taxa, consistent with previous studies. Availability: The software implementation of ZINB is available at: http://www.ssg.uab.edu/bhglm/. Supplementary information: Supplementary data are available at Journal of Bioinformatics and Genomics online.http://journal-biogen.org/article/view/12 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xinyan Zhang Himel Mallick Nengjun Yi |
spellingShingle |
Xinyan Zhang Himel Mallick Nengjun Yi ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES Journal of Bioinformatics and Genomics |
author_facet |
Xinyan Zhang Himel Mallick Nengjun Yi |
author_sort |
Xinyan Zhang |
title |
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES |
title_short |
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES |
title_full |
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES |
title_fullStr |
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES |
title_full_unstemmed |
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES |
title_sort |
zero-inflated negative binomial regression for differential abundance testing in microbiome studies |
publisher |
Cifra Publishing House |
series |
Journal of Bioinformatics and Genomics |
issn |
2530-1381 |
publishDate |
2016-12-01 |
description |
Motivation: The human microbiome plays an important role in human health and disease. The composition of the human microbiome is influenced by multiple factors and understanding these factors is critical to elucidate the role of the microbiome in health and disease and for development of new diagnostics or therapeutic targets based on the microbiome. 16S ribosomal RNA (rRNA) gene targeted amplicon sequencing is a commonly used approach to determine the taxonomic composition of the bacterial community. Operational taxonomic units (OTUs) are clustered based on generated sequence reads and used to determine whether and how the abundance of microbiome is correlated with some characteristics of the samples, such as health/disease status, smoking status, or dietary habit. However, OTU count data is not only overdispersed but also contains an excess number of zero counts due to undersampling. Efficient analytical tools are therefore needed for downstream statistical analysis which can simultaneously account for overdispersion and sparsity in microbiome data.
Results: In this paper, we propose a Zero-inflated Negative Binomial (ZINB) regression for identifying differentially abundant taxa between two or more populations. The proposed method utilizes an Expectation Maximization (EM) algorithm, by incorporating a two-part mixture model consisting of (i) a negative binomial model to account for over-dispersion and (ii) a logistic regression model to account for excessive zero counts. Extensive simulation studies are conducted which indicate that ZINB demonstrates better performance as compared to several state-of-the-art approaches, as measured by the area under the curve (AUC). Application to two real datasets indicate that the proposed method is capable of detecting biologically meaningful taxa, consistent with previous studies.
Availability: The software implementation of ZINB is available at: http://www.ssg.uab.edu/bhglm/.
Supplementary information: Supplementary data are available at Journal of Bioinformatics and Genomics online. |
url |
http://journal-biogen.org/article/view/12 |
work_keys_str_mv |
AT xinyanzhang zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies AT himelmallick zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies AT nengjunyi zeroinflatednegativebinomialregressionfordifferentialabundancetestinginmicrobiomestudies |
_version_ |
1725361378259107840 |