Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques

Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If...

Full description

Bibliographic Details
Other Authors: Scolnik, Ryan (authoraut)
Format: Others
Language:English
English
Published: Florida State University
Subjects:
Online Access:http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146
id ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_360437
record_format oai_dc
spelling ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_3604372020-06-24T03:07:13Z Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques Scolnik, Ryan (authoraut) McGee, Daniel (professor co-directing thesis) Slate, Elizabeth H. (professor co-directing thesis) Eberstein, Isaac W. (university representative) Huffer, Fred W. (Fred William) (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Statistics (degree granting department) Text text Florida State University Florida State University English eng 1 online resource (107 pages) computer application/pdf Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model. A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester 2016. April 8, 2016. auc, brier score, incidence rate, logistic regression, optimization Includes bibliographical references. Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis; Isaac Eberstein, University Representative; Fred Huffer, Committee Member. Statistics FSU_2016SP_Scolnik_fsu_0071E_13146 http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A360437/datastream/TN/view/Predictive%20Accuracy%20Measures%20for%20Binary%20Outcomes.jpg
collection NDLTD
language English
English
format Others
sources NDLTD
topic Statistics
spellingShingle Statistics
Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
description Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Spring Semester 2016. === April 8, 2016. === auc, brier score, incidence rate, logistic regression, optimization === Includes bibliographical references. === Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis; Isaac Eberstein, University Representative; Fred Huffer, Committee Member.
author2 Scolnik, Ryan (authoraut)
author_facet Scolnik, Ryan (authoraut)
title Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
title_short Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
title_full Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
title_fullStr Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
title_full_unstemmed Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques
title_sort predictive accuracy measures for binary outcomes: impact of incidence rate and optimization techniques
publisher Florida State University
url http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146
_version_ 1719323212437782528