Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

<p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit.</p> <p>Methods</p> <p&g...

Full description

Bibliographic Details
Main Authors: Hardt Jochen, Herke Max, Leonhart Rainer
Format: Article
Language:English
Published: BMC 2012-12-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://www.biomedcentral.com/1471-2288/12/184
id doaj-330c5e6b88b442af80a939e42b4ce5ca
record_format Article
spelling doaj-330c5e6b88b442af80a939e42b4ce5ca2020-11-25T00:09:56ZengBMCBMC Medical Research Methodology1471-22882012-12-0112118410.1186/1471-2288-12-184Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample researchHardt JochenHerke MaxLeonhart Rainer<p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit.</p> <p>Methods</p> <p>A simulation study of a linear regression with a response Y and two predictors X<sub>1</sub> and <it>X</it><sub>2</sub> was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y.</p> <p>Results</p> <p>The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful.</p> <p>Conclusion</p> <p>More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.</p> http://www.biomedcentral.com/1471-2288/12/184Multiple imputationAuxiliary variablesSimulation studySmall and medium size samples
collection DOAJ
language English
format Article
sources DOAJ
author Hardt Jochen
Herke Max
Leonhart Rainer
spellingShingle Hardt Jochen
Herke Max
Leonhart Rainer
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
BMC Medical Research Methodology
Multiple imputation
Auxiliary variables
Simulation study
Small and medium size samples
author_facet Hardt Jochen
Herke Max
Leonhart Rainer
author_sort Hardt Jochen
title Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_short Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_full Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_fullStr Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_full_unstemmed Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_sort auxiliary variables in multiple imputation in regression with missing x: a warning against including too many in small sample research
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2012-12-01
description <p>Abstract</p> <p>Background</p> <p>Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit.</p> <p>Methods</p> <p>A simulation study of a linear regression with a response Y and two predictors X<sub>1</sub> and <it>X</it><sub>2</sub> was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y.</p> <p>Results</p> <p>The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful.</p> <p>Conclusion</p> <p>More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.</p>
topic Multiple imputation
Auxiliary variables
Simulation study
Small and medium size samples
url http://www.biomedcentral.com/1471-2288/12/184
work_keys_str_mv AT hardtjochen auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch
AT herkemax auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch
AT leonhartrainer auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch
_version_ 1725409944207884288