An Ensemble Approach to Predicting Health Outcomes

Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving he...

Full description

Bibliographic Details
Other Authors: Nilles, Ester Kim (authoraut)
Format: Others
Language:English
English
Published: Florida State University
Subjects:
Online Access:http://purl.flvc.org/fsu/fd/FSU_migr_etd-7530
Description
Summary:Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve. Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models - the multilayer perceptron network, k-nearest neighbor and support vector machine - to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSE-weighted, AUC-weighted, logistic and flexible naive Bayes. The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Summer Semester, 2013. === June 18, 2013. === classification, coronary heart disease, ensemble modeling, machine learning, model selection, preterm birth === Includes bibliographical references. === Dan McGee, Professor Directing Dissertation; Jinfeng Zhang, Professor Co-Directing Dissertation; Isaac Eberstein, University Representative; Debajyoti Sinha, Committee Member.