Statistical modelling and inference for discrete and censored familial data

Analysis of familial data with quantitative traits based on the multivariate normal distribution has been well studied. However, little attention has been devoted to traits which do not have a multivariate normal distribution, such as traits with discrete or censored values. In this thesis, we de...

Full description

Bibliographic Details
Main Author: Zhao, Yinshan
Format: Others
Language:English
Published: 2009
Online Access:http://hdl.handle.net/2429/16044
id ndltd-UBC-oai-circle.library.ubc.ca-2429-16044
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-160442018-01-05T17:38:06Z Statistical modelling and inference for discrete and censored familial data Zhao, Yinshan Analysis of familial data with quantitative traits based on the multivariate normal distribution has been well studied. However, little attention has been devoted to traits which do not have a multivariate normal distribution, such as traits with discrete or censored values. In this thesis, we devote our effort to (1) construct models for familial data when the trait value is discrete and/or censored, and (2) study alternative estimation methods when maximum likelihood estimation is infeasible. We discuss two existing classes of models: models with random effects which are multivariate normally distributed, and models constructed from the multivariate normal copula. These two classes include a variety of models which can be applied to familial data. We also propose another class of models which we call conditional independence models. This type of model is based on a conditional independence assumption: for a trait variable, we assume independence of a pair of non-sibling relatives conditional on their parents, so that the dependence structure is built on the Markov property. Maximum likelihood estimates are generally difficult to obtain for random effect models and copula models when there are large families involved. We propose two estimation procedures based on composite likelihoods: the first is a two-stage method in which univariate marginal parameters are estimated based on univariate marginal distributions and the dependence parameters are estimated separately based on bivariate marginal distributions with the marginal parameters treated as known; whereas in the second, all the parameters are estimated using the likelihoods of bivariate marginal distributions. The composite likelihood methods can greatly reduce computation in parameter estimation, but with a price of efficiency loss. In this thesis, extensive investigations based on asymptotic covariance matrices and simulations were carried out to compare the asymptotic efficiency of these two procedures with the maximum likelihood method. In our efficiency comparisons, we investigate the multivariate normal model for a continuous trait, the multivariate probit model for a binary trait, the multivariate Poisson-lognormal mixture model for a count trait and multivariate lognormal model for a censored variable. We found that when the dependence is strong, the first approach is inefficient for the regression parameters; whereas when the dependence is weak, the second approach is inefficient for the dependence parameters. In many familial analyses, quantifying familial association is of great interest. For a binary trait, the odds ratio may be used as a measure of association between a parent-offspring pair or a sibling pair. We develop theories so that the asymptotic variance of an odds ratio can be computed from a 2 x 2 contingency table formed by dependent pairs. Science, Faculty of Statistics, Department of Graduate 2009-12-01T19:21:49Z 2009-12-01T19:21:49Z 2004 2004-05 Text Thesis/Dissertation http://hdl.handle.net/2429/16044 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. 7808346 bytes application/pdf
collection NDLTD
language English
format Others
sources NDLTD
description Analysis of familial data with quantitative traits based on the multivariate normal distribution has been well studied. However, little attention has been devoted to traits which do not have a multivariate normal distribution, such as traits with discrete or censored values. In this thesis, we devote our effort to (1) construct models for familial data when the trait value is discrete and/or censored, and (2) study alternative estimation methods when maximum likelihood estimation is infeasible. We discuss two existing classes of models: models with random effects which are multivariate normally distributed, and models constructed from the multivariate normal copula. These two classes include a variety of models which can be applied to familial data. We also propose another class of models which we call conditional independence models. This type of model is based on a conditional independence assumption: for a trait variable, we assume independence of a pair of non-sibling relatives conditional on their parents, so that the dependence structure is built on the Markov property. Maximum likelihood estimates are generally difficult to obtain for random effect models and copula models when there are large families involved. We propose two estimation procedures based on composite likelihoods: the first is a two-stage method in which univariate marginal parameters are estimated based on univariate marginal distributions and the dependence parameters are estimated separately based on bivariate marginal distributions with the marginal parameters treated as known; whereas in the second, all the parameters are estimated using the likelihoods of bivariate marginal distributions. The composite likelihood methods can greatly reduce computation in parameter estimation, but with a price of efficiency loss. In this thesis, extensive investigations based on asymptotic covariance matrices and simulations were carried out to compare the asymptotic efficiency of these two procedures with the maximum likelihood method. In our efficiency comparisons, we investigate the multivariate normal model for a continuous trait, the multivariate probit model for a binary trait, the multivariate Poisson-lognormal mixture model for a count trait and multivariate lognormal model for a censored variable. We found that when the dependence is strong, the first approach is inefficient for the regression parameters; whereas when the dependence is weak, the second approach is inefficient for the dependence parameters. In many familial analyses, quantifying familial association is of great interest. For a binary trait, the odds ratio may be used as a measure of association between a parent-offspring pair or a sibling pair. We develop theories so that the asymptotic variance of an odds ratio can be computed from a 2 x 2 contingency table formed by dependent pairs. === Science, Faculty of === Statistics, Department of === Graduate
author Zhao, Yinshan
spellingShingle Zhao, Yinshan
Statistical modelling and inference for discrete and censored familial data
author_facet Zhao, Yinshan
author_sort Zhao, Yinshan
title Statistical modelling and inference for discrete and censored familial data
title_short Statistical modelling and inference for discrete and censored familial data
title_full Statistical modelling and inference for discrete and censored familial data
title_fullStr Statistical modelling and inference for discrete and censored familial data
title_full_unstemmed Statistical modelling and inference for discrete and censored familial data
title_sort statistical modelling and inference for discrete and censored familial data
publishDate 2009
url http://hdl.handle.net/2429/16044
work_keys_str_mv AT zhaoyinshan statisticalmodellingandinferencefordiscreteandcensoredfamilialdata
_version_ 1718590089079029760