Summary: | Hierarchical models are applicable to modeling data from complex
surveys or longitudinal data when a clustered or multistage sample
design is employed. The focus of this thesis is to investigate
inference for discrete hierarchical models in the presence of
missing data. This thesis is divided into two parts: in the first
part, methods are developed to analyze the discrete and ordinal
response data from hierarchical longitudinal studies. Several
approximation methods have been developed to estimate the parameters
for the fixed and random effects in the context of generalized
linear models. The thesis focuses on two likelihood-based
estimation procedures, the pseudo likelihood (PL) method and the adaptive
Gaussian quadrature (AGQ) method.
The simulation results suggest that AGQ
is preferable to PL when the
goal is to estimate the variance of the random intercept in a
complex hierarchical model. AGQ provides smaller biases
for the estimate of the variance of the random intercept.
Furthermore, it permits greater
flexibility in accommodating user-defined likelihood functions.
In the second part, simulated data are used to develop a method for
modeling longitudinal binary data when non-response depends on
unobserved responses. This simulation study modeled three-level
discrete hierarchical data with 30% and 40% missing data
using a missing not at random (MNAR) missing-data mechanism. It
focused on a monotone missing data-pattern. The imputation methods
used in this thesis are: complete case analysis (CCA), last
observation carried forward (LOCF), available case missing value
(ACMVPM) restriction, complete case missing value (CCMVPM)
restriction, neighboring case missing value (NCMVPM) restriction,
selection model with predictive mean matching method (SMPM), and
Bayesian pattern mixture model. All three restriction methods and
the selection model used the predictive mean matching method to
impute missing data. Multiple imputation is used to impute the
missing values. These m imputed values for each missing data
produce m complete datasets. Each dataset is analyzed and the
parameters are estimated. The results from the m analyses are then
combined using the method of Rubin(1987), and inferences are
made from these results. Our results suggest that restriction
methods provide results that are superior to those of other methods.
The selection model provides smaller biases than the LOCF methods
but as the proportion of missing data increases the selection model
is not better than LOCF. Among the three restriction methods the
ACMVPM method performs best. The proposed method provides an
alternative to standard selection and pattern-mixture modeling
frameworks when data are not missing at random. This method is
applied to data from the third Waterloo Smoking Project, a
seven-year smoking prevention study having substantial non-response
due to loss-to-follow-up.
|