CytoGLMM: conditional differential analysis for flow and mass cytometry experiments

Abstract Background Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current da...

Full description

Bibliographic Details
Main Authors: Christof Seiler, Anne-Maud Ferreira, Lisa M. Kronstad, Laura J. Simpson, Mathieu Le Gars, Elena Vendrame, Catherine A. Blish, Susan Holmes
Format: Article
Language:English
Published: BMC 2021-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04067-x
id doaj-27cc448bdff14d00a6742280b3778089
record_format Article
spelling doaj-27cc448bdff14d00a6742280b37780892021-03-28T11:46:17ZengBMCBMC Bioinformatics1471-21052021-03-0122111410.1186/s12859-021-04067-xCytoGLMM: conditional differential analysis for flow and mass cytometry experimentsChristof Seiler0Anne-Maud Ferreira1Lisa M. Kronstad2Laura J. Simpson3Mathieu Le Gars4Elena Vendrame5Catherine A. Blish6Susan Holmes7Department of Data Science and Knowledge Engineering, Maastricht UniversityDepartment of Statistics, Stanford UniversityImmunology Program, Stanford University School of MedicineImmunology Program, Stanford University School of MedicineImmunology Program, Stanford University School of MedicineImmunology Program, Stanford University School of MedicineImmunology Program, Stanford University School of MedicineDepartment of Statistics, Stanford UniversityAbstract Background Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. Results Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the CytoGLMM R package and workflow for both strategies on a pregnancy dataset. Conclusion Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.https://doi.org/10.1186/s12859-021-04067-xHigh-dimensional cytometryGeneralized linear modelsGeneralized linear mixed models
collection DOAJ
language English
format Article
sources DOAJ
author Christof Seiler
Anne-Maud Ferreira
Lisa M. Kronstad
Laura J. Simpson
Mathieu Le Gars
Elena Vendrame
Catherine A. Blish
Susan Holmes
spellingShingle Christof Seiler
Anne-Maud Ferreira
Lisa M. Kronstad
Laura J. Simpson
Mathieu Le Gars
Elena Vendrame
Catherine A. Blish
Susan Holmes
CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
BMC Bioinformatics
High-dimensional cytometry
Generalized linear models
Generalized linear mixed models
author_facet Christof Seiler
Anne-Maud Ferreira
Lisa M. Kronstad
Laura J. Simpson
Mathieu Le Gars
Elena Vendrame
Catherine A. Blish
Susan Holmes
author_sort Christof Seiler
title CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
title_short CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
title_full CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
title_fullStr CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
title_full_unstemmed CytoGLMM: conditional differential analysis for flow and mass cytometry experiments
title_sort cytoglmm: conditional differential analysis for flow and mass cytometry experiments
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-03-01
description Abstract Background Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. Results Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the CytoGLMM R package and workflow for both strategies on a pregnancy dataset. Conclusion Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.
topic High-dimensional cytometry
Generalized linear models
Generalized linear mixed models
url https://doi.org/10.1186/s12859-021-04067-x
work_keys_str_mv AT christofseiler cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT annemaudferreira cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT lisamkronstad cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT laurajsimpson cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT mathieulegars cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT elenavendrame cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT catherineablish cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
AT susanholmes cytoglmmconditionaldifferentialanalysisforflowandmasscytometryexperiments
_version_ 1724199589211275264