On the use of hierarchical models for multiple imputation and synthetic data generation

Missing data are often imputed with plausible values when various analyses are performed. One popular approach employed to impute data is multiple imputation, which requires specification of a suitable imputation model. This thesis investigates the impact on multiply imputed hierarchical datasets wh...

Full description

Bibliographic Details
Main Author: Rashid, Sana
Other Authors: Mitra, Robin ; Kouris, Nikos
Published: University of Southampton 2017
Subjects:
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.720202
id ndltd-bl.uk-oai-ethos.bl.uk-720202
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7202022018-11-27T03:19:32ZOn the use of hierarchical models for multiple imputation and synthetic data generationRashid, SanaMitra, Robin ; Kouris, Nikos2017Missing data are often imputed with plausible values when various analyses are performed. One popular approach employed to impute data is multiple imputation, which requires specification of a suitable imputation model. This thesis investigates the impact on multiply imputed hierarchical datasets when the imputation model is misspecified. The first issue studied is the presence of omitted variable bias. The same issue is then studied with a focus on the use of multiple imputation for creating synthetic data to protect data confidentiality. Here, the quality of multiply imputed datasets is studied not only through performance of various analysis models, but also, risks of disclosure for sensitive data. With the help of simulation studies and a longitudinal dataset from establishments in Germany, the detrimental effect of such model misspecification is evaluated, and recommendations are made for users of multiple imputation for both missing and synthetic data. The second issue investigated is model misspecification due to incorrect modelling of the shape of the error term. Existing methods for robust regression and alternatives to the normal distribution are compared within the synthetic data context only. Results from simulation studies and data on household wealth in the UK are used to identify appropriate methods for multiple imputation in such a scenario.001.4University of Southamptonhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.720202https://eprints.soton.ac.uk/412632/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 001.4
spellingShingle 001.4
Rashid, Sana
On the use of hierarchical models for multiple imputation and synthetic data generation
description Missing data are often imputed with plausible values when various analyses are performed. One popular approach employed to impute data is multiple imputation, which requires specification of a suitable imputation model. This thesis investigates the impact on multiply imputed hierarchical datasets when the imputation model is misspecified. The first issue studied is the presence of omitted variable bias. The same issue is then studied with a focus on the use of multiple imputation for creating synthetic data to protect data confidentiality. Here, the quality of multiply imputed datasets is studied not only through performance of various analysis models, but also, risks of disclosure for sensitive data. With the help of simulation studies and a longitudinal dataset from establishments in Germany, the detrimental effect of such model misspecification is evaluated, and recommendations are made for users of multiple imputation for both missing and synthetic data. The second issue investigated is model misspecification due to incorrect modelling of the shape of the error term. Existing methods for robust regression and alternatives to the normal distribution are compared within the synthetic data context only. Results from simulation studies and data on household wealth in the UK are used to identify appropriate methods for multiple imputation in such a scenario.
author2 Mitra, Robin ; Kouris, Nikos
author_facet Mitra, Robin ; Kouris, Nikos
Rashid, Sana
author Rashid, Sana
author_sort Rashid, Sana
title On the use of hierarchical models for multiple imputation and synthetic data generation
title_short On the use of hierarchical models for multiple imputation and synthetic data generation
title_full On the use of hierarchical models for multiple imputation and synthetic data generation
title_fullStr On the use of hierarchical models for multiple imputation and synthetic data generation
title_full_unstemmed On the use of hierarchical models for multiple imputation and synthetic data generation
title_sort on the use of hierarchical models for multiple imputation and synthetic data generation
publisher University of Southampton
publishDate 2017
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.720202
work_keys_str_mv AT rashidsana ontheuseofhierarchicalmodelsformultipleimputationandsyntheticdatageneration
_version_ 1718797338175078400