Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis
Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_migr_etd-1360 |
id |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_253957 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_2539572020-06-19T03:09:50Z Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis Thompson, Warren R. (Warren Robert) (authoraut) McGee, Daniel (professor directing dissertation) Eberstein, Isaac (university representative) Huffer, Fred (committee member) Sinha, Debajyoti (committee member) She, Yiyuan (committee member) Department of Statistics (degree granting department) Florida State University (degree granting institution) Text text Florida State University Florida State University English eng 1 online resource computer application/pdf Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the diet-heart hypothesis. The diet-heart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program - a study of CHD incidence in men of Japanese descent. Our results were largely method-specific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended. A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Fall Semester, 2009. August 10, 2009. Logistic Regression, Bootstrap, Lasso, Ridge Regression, Bayesian Model Averaging, Diet-Heart Hypothesis Includes bibliographical references. Daniel McGee, Professor Directing Dissertation; Isaac Eberstein, University Representative; Fred Huffer, Committee Member; Debajyoti Sinha, Committee Member; Yiyuan She, Committee Member. Statistics FSU_migr_etd-1360 http://purl.flvc.org/fsu/fd/FSU_migr_etd-1360 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A253957/datastream/TN/view/Variable%20Selection%20of%20Correlated%20Predictors%20in%20Logistic%20Regression.jpg |
collection |
NDLTD |
language |
English English |
format |
Others
|
sources |
NDLTD |
topic |
Statistics |
spellingShingle |
Statistics Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
description |
Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the diet-heart hypothesis. The diet-heart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program - a study of CHD incidence in men of Japanese descent. Our results were largely method-specific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. === Fall Semester, 2009. === August 10, 2009. === Logistic Regression, Bootstrap, Lasso, Ridge Regression, Bayesian Model Averaging, Diet-Heart Hypothesis === Includes bibliographical references. === Daniel McGee, Professor Directing Dissertation; Isaac Eberstein, University Representative; Fred Huffer, Committee Member; Debajyoti Sinha, Committee Member; Yiyuan She, Committee Member. |
author2 |
Thompson, Warren R. (Warren Robert) (authoraut) |
author_facet |
Thompson, Warren R. (Warren Robert) (authoraut) |
title |
Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
title_short |
Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
title_full |
Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
title_fullStr |
Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
title_full_unstemmed |
Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis |
title_sort |
variable selection of correlated predictors in logistic regression: investigating the diet-heart hypothesis |
publisher |
Florida State University |
url |
http://purl.flvc.org/fsu/fd/FSU_migr_etd-1360 |
_version_ |
1719322210549628928 |