Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data

Abstract Background Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nati...

Full description

Bibliographic Details
Main Authors:	John Wallert, Mattia Tomasoni, Guy Madison, Claes Held
Format:	Article
Language:	English
Published:	BMC 2017-07-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Cardiovascular disease Classification Coronary Artery Syndrome Prognostic Modelling Myocardial infarction Registries
Online Access:	http://link.springer.com/article/10.1186/s12911-017-0500-y

id	doaj-316d9e1d59ad433382d1817fc207569f
record_format	Article
spelling	doaj-316d9e1d59ad433382d1817fc207569f2020-11-24T23:59:40ZengBMCBMC Medical Informatics and Decision Making1472-69472017-07-0117111110.1186/s12911-017-0500-yPredicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register dataJohn Wallert0Mattia Tomasoni1Guy Madison2Claes Held3Department of Public Health and Caring Sciences, Uppsala UniversityDepartment of Public Health and Caring Sciences, Uppsala UniversityDepartment of Psychology, Umeå UniversityDepartment of Medical Sciences, Uppsala UniversityAbstract Background Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nation-wide population data from Sweden to solve the binary classification problem of predicting survival versus non-survival 2 years after first myocardial infarction (MI). Methods This prospective national registry study for prognostic accuracy validation of predictive models used data from 51,943 complete first MI cases as registered during 6 years (2006–2011) in the national quality register SWEDEHEART/RIKS-HIA (90% coverage of all MIs in Sweden) with follow-up in the Cause of Death register (> 99% coverage). Primary outcome was AUROC (C-statistic) performance of each model on the untouched test set (40% of cases) after model development on the training set (60% of cases) with the full (39) predictor set. Model AUROCs were bootstrapped and compared, correcting the P-values for multiple comparisons with the Bonferroni method. Secondary outcomes were derived when varying sample size (1–100% of total) and predictor sets (39, 10, and 5) for each model. Analyses were repeated on 79,869 completed cases after multivariable imputation of predictors. Results A Support Vector Machine with a radial basis kernel developed on 39 predictors had the highest complete cases performance on the test set (AUROC = 0.845, PPV = 0.280, NPV = 0.966) outperforming Boosted C5.0 (0.845 vs. 0.841, P = 0.028) but not significantly higher than Logistic Regression or Random Forest. Models converged to the point of algorithm indifference with increased sample size and predictors. Using the top five predictors also produced good classifiers. Imputed analyses had slightly higher performance. Conclusions Improved mortality prediction at hospital discharge after first MI is important for identifying high-risk individuals eligible for intensified treatment and care. All models performed accurately and similarly and because of the superior national coverage, the best model can potentially be used to better differentiate new patients, allowing for improved targeting of limited resources. Future research should focus on further model development and investigate possibilities for implementation.http://link.springer.com/article/10.1186/s12911-017-0500-yCardiovascular diseaseClassificationCoronary Artery SyndromePrognostic ModellingMyocardial infarctionRegistries
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	John Wallert Mattia Tomasoni Guy Madison Claes Held
spellingShingle	John Wallert Mattia Tomasoni Guy Madison Claes Held Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data BMC Medical Informatics and Decision Making Cardiovascular disease Classification Coronary Artery Syndrome Prognostic Modelling Myocardial infarction Registries
author_facet	John Wallert Mattia Tomasoni Guy Madison Claes Held
author_sort	John Wallert
title	Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data
title_short	Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data
title_full	Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data
title_fullStr	Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data
title_full_unstemmed	Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data
title_sort	predicting two-year survival versus non-survival after first myocardial infarction using machine learning and swedish national register data
publisher	BMC
series	BMC Medical Informatics and Decision Making
issn	1472-6947
publishDate	2017-07-01
description	Abstract Background Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nation-wide population data from Sweden to solve the binary classification problem of predicting survival versus non-survival 2 years after first myocardial infarction (MI). Methods This prospective national registry study for prognostic accuracy validation of predictive models used data from 51,943 complete first MI cases as registered during 6 years (2006–2011) in the national quality register SWEDEHEART/RIKS-HIA (90% coverage of all MIs in Sweden) with follow-up in the Cause of Death register (> 99% coverage). Primary outcome was AUROC (C-statistic) performance of each model on the untouched test set (40% of cases) after model development on the training set (60% of cases) with the full (39) predictor set. Model AUROCs were bootstrapped and compared, correcting the P-values for multiple comparisons with the Bonferroni method. Secondary outcomes were derived when varying sample size (1–100% of total) and predictor sets (39, 10, and 5) for each model. Analyses were repeated on 79,869 completed cases after multivariable imputation of predictors. Results A Support Vector Machine with a radial basis kernel developed on 39 predictors had the highest complete cases performance on the test set (AUROC = 0.845, PPV = 0.280, NPV = 0.966) outperforming Boosted C5.0 (0.845 vs. 0.841, P = 0.028) but not significantly higher than Logistic Regression or Random Forest. Models converged to the point of algorithm indifference with increased sample size and predictors. Using the top five predictors also produced good classifiers. Imputed analyses had slightly higher performance. Conclusions Improved mortality prediction at hospital discharge after first MI is important for identifying high-risk individuals eligible for intensified treatment and care. All models performed accurately and similarly and because of the superior national coverage, the best model can potentially be used to better differentiate new patients, allowing for improved targeting of limited resources. Future research should focus on further model development and investigate possibilities for implementation.
topic	Cardiovascular disease Classification Coronary Artery Syndrome Prognostic Modelling Myocardial infarction Registries
url	http://link.springer.com/article/10.1186/s12911-017-0500-y
work_keys_str_mv	AT johnwallert predictingtwoyearsurvivalversusnonsurvivalafterfirstmyocardialinfarctionusingmachinelearningandswedishnationalregisterdata AT mattiatomasoni predictingtwoyearsurvivalversusnonsurvivalafterfirstmyocardialinfarctionusingmachinelearningandswedishnationalregisterdata AT guymadison predictingtwoyearsurvivalversusnonsurvivalafterfirstmyocardialinfarctionusingmachinelearningandswedishnationalregisterdata AT claesheld predictingtwoyearsurvivalversusnonsurvivalafterfirstmyocardialinfarctionusingmachinelearningandswedishnationalregisterdata
_version_	1725446784320274432

Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data

Similar Items