Variable selection in discrete survival models

MSc (Statistics) === Department of Statistics === Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival da...

Full description

Bibliographic Details
Main Author: Mabvuu, Coster
Other Authors: Bere, A.
Format: Others
Language:en
Published: 2020
Subjects:
Online Access:Mabvuu, Coster (2020) Variable selection in discrete survival models. University of Venda, South Africa.<http://hdl.handle.net/11602/1552>.
http://hdl.handle.net/11602/1552
id ndltd-netd.ac.za-oai-union.ndltd.org-univen-oai-univendspace.univen.ac.za-11602-1552
record_format oai_dc
spelling ndltd-netd.ac.za-oai-union.ndltd.org-univen-oai-univendspace.univen.ac.za-11602-15522020-11-20T05:11:22Z Variable selection in discrete survival models Mabvuu, Coster Bere, A. Sigauke, C. Boosting Discrete-time hazard model Lasso Penalised variable selection methods Unobservrd heterogeneity 519.546 Survival analysis (Biometry) Biometry Failure time data analysis MSc (Statistics) Department of Statistics Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. NRF 2020 2020-09-29T19:33:45Z 2020-09-29T19:33:45Z 2020-02-27 Dissertation Mabvuu, Coster (2020) Variable selection in discrete survival models. University of Venda, South Africa.<http://hdl.handle.net/11602/1552>. http://hdl.handle.net/11602/1552 en University of Venda 1 online resource (xviii, 83 leaves) application/pdf
collection NDLTD
language en
format Others
sources NDLTD
topic Boosting
Discrete-time hazard model
Lasso
Penalised variable selection methods
Unobservrd heterogeneity
519.546
Survival analysis (Biometry)
Biometry
Failure time data analysis
spellingShingle Boosting
Discrete-time hazard model
Lasso
Penalised variable selection methods
Unobservrd heterogeneity
519.546
Survival analysis (Biometry)
Biometry
Failure time data analysis
Mabvuu, Coster
Variable selection in discrete survival models
description MSc (Statistics) === Department of Statistics === Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. === NRF
author2 Bere, A.
author_facet Bere, A.
Mabvuu, Coster
author Mabvuu, Coster
author_sort Mabvuu, Coster
title Variable selection in discrete survival models
title_short Variable selection in discrete survival models
title_full Variable selection in discrete survival models
title_fullStr Variable selection in discrete survival models
title_full_unstemmed Variable selection in discrete survival models
title_sort variable selection in discrete survival models
publishDate 2020
url Mabvuu, Coster (2020) Variable selection in discrete survival models. University of Venda, South Africa.<http://hdl.handle.net/11602/1552>.
http://hdl.handle.net/11602/1552
work_keys_str_mv AT mabvuucoster variableselectionindiscretesurvivalmodels
_version_ 1719358057987702784