The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach

Abstract Background Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the di...

Full description

Bibliographic Details
Main Authors: Oscar L. Olvera Astivia, Anne Gadermann, Martin Guhn
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Medical Research Methodology
Online Access:http://link.springer.com/article/10.1186/s12874-019-0742-8
id doaj-c28f9f5caf644072b436e7bb617dbfc6
record_format Article
spelling doaj-c28f9f5caf644072b436e7bb617dbfc62020-11-25T02:55:08ZengBMCBMC Medical Research Methodology1471-22882019-05-0119112010.1186/s12874-019-0742-8The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approachOscar L. Olvera Astivia0Anne Gadermann1Martin Guhn2Human Early Learning Partnership, The University of British ColumbiaHuman Early Learning Partnership, The University of British ColumbiaHuman Early Learning Partnership, The University of British ColumbiaAbstract Background Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression. Method Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects. Results Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor. Conclusions Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models.http://link.springer.com/article/10.1186/s12874-019-0742-8
collection DOAJ
language English
format Article
sources DOAJ
author Oscar L. Olvera Astivia
Anne Gadermann
Martin Guhn
spellingShingle Oscar L. Olvera Astivia
Anne Gadermann
Martin Guhn
The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
BMC Medical Research Methodology
author_facet Oscar L. Olvera Astivia
Anne Gadermann
Martin Guhn
author_sort Oscar L. Olvera Astivia
title The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
title_short The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
title_full The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
title_fullStr The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
title_full_unstemmed The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
title_sort relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2019-05-01
description Abstract Background Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression. Method Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects. Results Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor. Conclusions Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models.
url http://link.springer.com/article/10.1186/s12874-019-0742-8
work_keys_str_mv AT oscarlolveraastivia therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
AT annegadermann therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
AT martinguhn therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
AT oscarlolveraastivia relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
AT annegadermann relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
AT martinguhn relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach
_version_ 1724717999561113600