Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study

Abstract Background When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through in...

Full description

Bibliographic Details
Main Authors: R. P. Cornish, J. Macleod, J. R. Carpenter, K. Tilling
Format: Article
Language:English
Published: BMC 2017-12-01
Series:Emerging Themes in Epidemiology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12982-017-0068-0
id doaj-e9f0ee71f8254d0996778240ad915652
record_format Article
spelling doaj-e9f0ee71f8254d0996778240ad9156522020-11-24T21:49:14ZengBMCEmerging Themes in Epidemiology1742-76222017-12-0114111310.1186/s12982-017-0068-0Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation studyR. P. Cornish0J. Macleod1J. R. Carpenter2K. Tilling3Population Health Sciences, Bristol Medical School, University of BristolPopulation Health Sciences, Bristol Medical School, University of BristolDepartment of Medical Statistics, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical MedicinePopulation Health Sciences, Bristol Medical School, University of BristolAbstract Background When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI). Methods Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete. Results Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest. Conclusions In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.http://link.springer.com/article/10.1186/s12982-017-0068-0Missing dataMultiple imputationBiasSimulation studyALSPACData linkage
collection DOAJ
language English
format Article
sources DOAJ
author R. P. Cornish
J. Macleod
J. R. Carpenter
K. Tilling
spellingShingle R. P. Cornish
J. Macleod
J. R. Carpenter
K. Tilling
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
Emerging Themes in Epidemiology
Missing data
Multiple imputation
Bias
Simulation study
ALSPAC
Data linkage
author_facet R. P. Cornish
J. Macleod
J. R. Carpenter
K. Tilling
author_sort R. P. Cornish
title Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
title_short Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
title_full Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
title_fullStr Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
title_full_unstemmed Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
title_sort multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study
publisher BMC
series Emerging Themes in Epidemiology
issn 1742-7622
publishDate 2017-12-01
description Abstract Background When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI). Methods Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete. Results Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest. Conclusions In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
topic Missing data
Multiple imputation
Bias
Simulation study
ALSPAC
Data linkage
url http://link.springer.com/article/10.1186/s12982-017-0068-0
work_keys_str_mv AT rpcornish multipleimputationusinglinkedproxyoutcomedataresultedinimportantbiasreductionandefficiencygainsasimulationstudy
AT jmacleod multipleimputationusinglinkedproxyoutcomedataresultedinimportantbiasreductionandefficiencygainsasimulationstudy
AT jrcarpenter multipleimputationusinglinkedproxyoutcomedataresultedinimportantbiasreductionandefficiencygainsasimulationstudy
AT ktilling multipleimputationusinglinkedproxyoutcomedataresultedinimportantbiasreductionandefficiencygainsasimulationstudy
_version_ 1725888570348011520