Outcome-sensitive multiple imputation: a simulation study

Abstract Background Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recom...

Full description

Bibliographic Details
Main Authors: Evangelos Kontopantelis, Ian R. White, Matthew Sperrin, Iain Buchan
Format: Article
Language:English
Published: BMC 2017-01-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-016-0281-5
id doaj-d7343e68eb3840a2806f459f8f55b13b
record_format Article
spelling doaj-d7343e68eb3840a2806f459f8f55b13b2020-11-25T01:31:59ZengBMCBMC Medical Research Methodology1471-22882017-01-0117111310.1186/s12874-016-0281-5Outcome-sensitive multiple imputation: a simulation studyEvangelos Kontopantelis0Ian R. White1Matthew Sperrin2Iain Buchan3The Farr Institute for Health Informatics Research, University of ManchesterMRC Biostatistics Unit, Cambridge Institute of Public HealthThe Farr Institute for Health Informatics Research, University of ManchesterThe Farr Institute for Health Informatics Research, University of ManchesterAbstract Background Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels. Methods We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20–80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario. Results Overall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection. Conclusions As long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the multiple imputation model. Multiple imputation offers some protection against a simple missing not at random mechanism.http://link.springer.com/article/10.1186/s12874-016-0281-5Multiple imputationImputed outcomeMissing dataMissingness
collection DOAJ
language English
format Article
sources DOAJ
author Evangelos Kontopantelis
Ian R. White
Matthew Sperrin
Iain Buchan
spellingShingle Evangelos Kontopantelis
Ian R. White
Matthew Sperrin
Iain Buchan
Outcome-sensitive multiple imputation: a simulation study
BMC Medical Research Methodology
Multiple imputation
Imputed outcome
Missing data
Missingness
author_facet Evangelos Kontopantelis
Ian R. White
Matthew Sperrin
Iain Buchan
author_sort Evangelos Kontopantelis
title Outcome-sensitive multiple imputation: a simulation study
title_short Outcome-sensitive multiple imputation: a simulation study
title_full Outcome-sensitive multiple imputation: a simulation study
title_fullStr Outcome-sensitive multiple imputation: a simulation study
title_full_unstemmed Outcome-sensitive multiple imputation: a simulation study
title_sort outcome-sensitive multiple imputation: a simulation study
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2017-01-01
description Abstract Background Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels. Methods We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20–80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario. Results Overall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection. Conclusions As long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the multiple imputation model. Multiple imputation offers some protection against a simple missing not at random mechanism.
topic Multiple imputation
Imputed outcome
Missing data
Missingness
url http://link.springer.com/article/10.1186/s12874-016-0281-5
work_keys_str_mv AT evangeloskontopantelis outcomesensitivemultipleimputationasimulationstudy
AT ianrwhite outcomesensitivemultipleimputationasimulationstudy
AT matthewsperrin outcomesensitivemultipleimputationasimulationstudy
AT iainbuchan outcomesensitivemultipleimputationasimulationstudy
_version_ 1725083943740375040