Variable screening and graphical modeling for ultra-high dimensional longitudinal data

Ultrahigh-dimensional variable selection is of great importance in the statistical research. And independence screening is a powerful tool to select important variable when there are massive variables. Some commonly used independence screening procedures are based on single replicate data and are no...

Full description

Bibliographic Details
Main Author: Zhang, Yafei
Other Authors: Statistics
Format: Others
Published: Virginia Tech 2020
Subjects:
Online Access:http://hdl.handle.net/10919/101662
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-101662
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-1016622020-12-25T06:09:05Z Variable screening and graphical modeling for ultra-high dimensional longitudinal data Zhang, Yafei Statistics Du, Pang Wu, Xiaowei Kim, Inyoung Hong, Yili graphical model variable screening longitudinal data analysis Ultrahigh-dimensional variable selection is of great importance in the statistical research. And independence screening is a powerful tool to select important variable when there are massive variables. Some commonly used independence screening procedures are based on single replicate data and are not applicable to longitudinal data. This motivates us to propose a new Sure Independence Screening (SIS) procedure to bring the dimension from ultra-high down to a relatively large scale which is similar to or smaller than the sample size. In chapter 2, we provide two types of SIS, and their iterative extensions (iterative SIS) to enhance the finite sample performance. An upper bound on the number of variables to be included is derived and assumptions are given under which sure screening is applicable. The proposed procedures are assessed by simulations and an application of them to a study on systemic lupus erythematosus illustrates the practical use of these procedures. After the variables screening process, we then explore the relationship among the variables. Graphical models are commonly used to explore the association network for a set of variables, which could be genes or other objects under study. However, graphical modes currently used are only designed for single replicate data, rather than longitudinal data. In chapter 3, we propose a penalized likelihood approach to identify the edges in a conditional independence graph for longitudinal data. We used pairwise coordinate descent combined with second order cone programming to optimize the penalized likelihood and estimate the parameters. Furthermore, we extended the nodewise regression method the for longitudinal data case. Simulation and real data analysis exhibit the competitive performance of the penalized likelihood method. Doctor of Philosophy 2020-12-24T07:00:36Z 2020-12-24T07:00:36Z 2019-07-02 Dissertation vt_gsexam:20483 http://hdl.handle.net/10919/101662 This item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s). ETD application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic graphical model
variable screening
longitudinal data analysis
spellingShingle graphical model
variable screening
longitudinal data analysis
Zhang, Yafei
Variable screening and graphical modeling for ultra-high dimensional longitudinal data
description Ultrahigh-dimensional variable selection is of great importance in the statistical research. And independence screening is a powerful tool to select important variable when there are massive variables. Some commonly used independence screening procedures are based on single replicate data and are not applicable to longitudinal data. This motivates us to propose a new Sure Independence Screening (SIS) procedure to bring the dimension from ultra-high down to a relatively large scale which is similar to or smaller than the sample size. In chapter 2, we provide two types of SIS, and their iterative extensions (iterative SIS) to enhance the finite sample performance. An upper bound on the number of variables to be included is derived and assumptions are given under which sure screening is applicable. The proposed procedures are assessed by simulations and an application of them to a study on systemic lupus erythematosus illustrates the practical use of these procedures. After the variables screening process, we then explore the relationship among the variables. Graphical models are commonly used to explore the association network for a set of variables, which could be genes or other objects under study. However, graphical modes currently used are only designed for single replicate data, rather than longitudinal data. In chapter 3, we propose a penalized likelihood approach to identify the edges in a conditional independence graph for longitudinal data. We used pairwise coordinate descent combined with second order cone programming to optimize the penalized likelihood and estimate the parameters. Furthermore, we extended the nodewise regression method the for longitudinal data case. Simulation and real data analysis exhibit the competitive performance of the penalized likelihood method. === Doctor of Philosophy
author2 Statistics
author_facet Statistics
Zhang, Yafei
author Zhang, Yafei
author_sort Zhang, Yafei
title Variable screening and graphical modeling for ultra-high dimensional longitudinal data
title_short Variable screening and graphical modeling for ultra-high dimensional longitudinal data
title_full Variable screening and graphical modeling for ultra-high dimensional longitudinal data
title_fullStr Variable screening and graphical modeling for ultra-high dimensional longitudinal data
title_full_unstemmed Variable screening and graphical modeling for ultra-high dimensional longitudinal data
title_sort variable screening and graphical modeling for ultra-high dimensional longitudinal data
publisher Virginia Tech
publishDate 2020
url http://hdl.handle.net/10919/101662
work_keys_str_mv AT zhangyafei variablescreeningandgraphicalmodelingforultrahighdimensionallongitudinaldata
_version_ 1719371598728790016