Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146 |
id |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_360437 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_3604372020-06-24T03:07:13Z Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques Scolnik, Ryan (authoraut) McGee, Daniel (professor co-directing thesis) Slate, Elizabeth H. (professor co-directing thesis) Eberstein, Isaac W. (university representative) Huffer, Fred W. (Fred William) (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Statistics (degree granting department) Text text Florida State University Florida State University English eng 1 online resource (107 pages) computer application/pdf Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model. A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester 2016. April 8, 2016. auc, brier score, incidence rate, logistic regression, optimization Includes bibliographical references. Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis; Isaac Eberstein, University Representative; Fred Huffer, Committee Member. Statistics FSU_2016SP_Scolnik_fsu_0071E_13146 http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A360437/datastream/TN/view/Predictive%20Accuracy%20Measures%20for%20Binary%20Outcomes.jpg |
collection |
NDLTD |
language |
English English |
format |
Others
|
sources |
NDLTD |
topic |
Statistics |
spellingShingle |
Statistics Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
description |
Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures
intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes.
If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit
while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include
the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships
among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies
and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these
analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different
conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to
logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area
under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and
close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data
to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the
relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in
a model. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements
for the degree of Doctor of Philosophy. === Spring Semester 2016. === April 8, 2016. === auc, brier score, incidence rate, logistic regression, optimization === Includes bibliographical references. === Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis;
Isaac Eberstein, University Representative; Fred Huffer, Committee Member. |
author2 |
Scolnik, Ryan (authoraut) |
author_facet |
Scolnik, Ryan (authoraut) |
title |
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
title_short |
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
title_full |
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
title_fullStr |
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
title_full_unstemmed |
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques |
title_sort |
predictive accuracy measures for binary outcomes: impact of incidence rate and optimization techniques |
publisher |
Florida State University |
url |
http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146 |
_version_ |
1719323212437782528 |