Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models

Observational studies predicated on the secondary use of information from administrative and health databases often encounter the problem of missing and mismeasured data. Although there is much methodological literature pertaining to each problem in isolation, there is a scant body of literature add...

Full description

Bibliographic Details
Main Author: Regier, Michael David
Format: Others
Language:English
Published: University of British Columbia 2009
Online Access:http://hdl.handle.net/2429/15883
id ndltd-UBC-oai-circle.library.ubc.ca-2429-15883
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-158832018-01-05T17:23:56Z Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models Regier, Michael David Observational studies predicated on the secondary use of information from administrative and health databases often encounter the problem of missing and mismeasured data. Although there is much methodological literature pertaining to each problem in isolation, there is a scant body of literature addressing both problems in tandem. I investigate the effect of missing and mismeasured covariates on parameter estimation from a binary logistic regression model and propose a likelihood based method to adjust for the combined data deficiencies. Two simulation studies are used to understand the effect of data imperfection on parameter estimation and to evaluate the utility of a likelihood based adjustment. When missing and mismeasured data occurred for separate covariates, I found that the parameter estimate associated with the mismeasured portion was biased and that the parameter estimate for the missing data aspect may be biased under both missing at random and non-ignorable missing at random assumptions. A Monte Carlo Expectation-Maximization adjustment reduced the magnitude of the bias, but a trade-off was observed. Bias reduction for the mismeasured covariate was achieved by increasing the bias associated with the others. When both problems affected a single covariate, the parameter estimate for the imperfect covariate was biased. Additionally, the parameter estimates for the other covariates were also biased. The Monte Carlo Expectation-Maximization adjustment often corrected the bias, but the bias trade-off amongst the covariates was observed. For both simulation studies, I observed a potential dissimilarity across missing data mechanisms. A substantive data set was investigated and by using the second simulation study, which was structurally similar, I could provide reasonable conclusions about the nature of the estimates. Also, I could suggest avenues of research which would potentially minimize expenditures for additional high quality data. I conclude that the problem of imperfection may be addressed through standard statistical methodology, but that the known effects of missing data or measurement error may not manifest as expected when more general data imperfections are considered. Science, Faculty of Statistics, Department of Graduate 2009-11-27T19:34:21Z 2009-11-27T19:34:21Z 2009 2009-11 Text Thesis/Dissertation http://hdl.handle.net/2429/15883 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ 4407561 bytes application/pdf University of British Columbia
collection NDLTD
language English
format Others
sources NDLTD
description Observational studies predicated on the secondary use of information from administrative and health databases often encounter the problem of missing and mismeasured data. Although there is much methodological literature pertaining to each problem in isolation, there is a scant body of literature addressing both problems in tandem. I investigate the effect of missing and mismeasured covariates on parameter estimation from a binary logistic regression model and propose a likelihood based method to adjust for the combined data deficiencies. Two simulation studies are used to understand the effect of data imperfection on parameter estimation and to evaluate the utility of a likelihood based adjustment. When missing and mismeasured data occurred for separate covariates, I found that the parameter estimate associated with the mismeasured portion was biased and that the parameter estimate for the missing data aspect may be biased under both missing at random and non-ignorable missing at random assumptions. A Monte Carlo Expectation-Maximization adjustment reduced the magnitude of the bias, but a trade-off was observed. Bias reduction for the mismeasured covariate was achieved by increasing the bias associated with the others. When both problems affected a single covariate, the parameter estimate for the imperfect covariate was biased. Additionally, the parameter estimates for the other covariates were also biased. The Monte Carlo Expectation-Maximization adjustment often corrected the bias, but the bias trade-off amongst the covariates was observed. For both simulation studies, I observed a potential dissimilarity across missing data mechanisms. A substantive data set was investigated and by using the second simulation study, which was structurally similar, I could provide reasonable conclusions about the nature of the estimates. Also, I could suggest avenues of research which would potentially minimize expenditures for additional high quality data. I conclude that the problem of imperfection may be addressed through standard statistical methodology, but that the known effects of missing data or measurement error may not manifest as expected when more general data imperfections are considered. === Science, Faculty of === Statistics, Department of === Graduate
author Regier, Michael David
spellingShingle Regier, Michael David
Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
author_facet Regier, Michael David
author_sort Regier, Michael David
title Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
title_short Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
title_full Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
title_fullStr Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
title_full_unstemmed Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
title_sort imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
publisher University of British Columbia
publishDate 2009
url http://hdl.handle.net/2429/15883
work_keys_str_mv AT regiermichaeldavid imperfectvariablesthecombinedproblemofmissingdataandmismeasuredvariableswithapplicationtogeneralizedlinearmodels
_version_ 1718582279958167552