Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

Abstract Background The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially...

Full description

Bibliographic Details
Main Authors: Alexandros C. Dimopoulos, Mara Nikolaidou, Francisco Félix Caballero, Worrawat Engchuan, Albert Sanchez-Niubo, Holger Arndt, José Luis Ayuso-Mateos, Josep Maria Haro, Somnath Chatterji, Ekavi N. Georgousopoulou, Christos Pitsavos, Demosthenes B. Panagiotakos
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-018-0644-1
id doaj-7768fde1fc96457ca27bc9eb6cecd5de
record_format Article
spelling doaj-7768fde1fc96457ca27bc9eb6cecd5de2020-11-25T00:36:38ZengBMCBMC Medical Research Methodology1471-22882018-12-0118111110.1186/s12874-018-0644-1Machine learning methodologies versus cardiovascular risk scores, in predicting disease riskAlexandros C. Dimopoulos0Mara Nikolaidou1Francisco Félix Caballero2Worrawat Engchuan3Albert Sanchez-Niubo4Holger Arndt5José Luis Ayuso-Mateos6Josep Maria Haro7Somnath Chatterji8Ekavi N. Georgousopoulou9Christos Pitsavos10Demosthenes B. Panagiotakos11Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio UniversityDepartment of Informatics & Telematics, School of Digital Technology, Harokopio UniversityDepartment of Preventive Medicine and Public Health, Universidad Autónoma de MadridThe Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick ChildrenParc Sanitari Sant Joan de DéuSPRING TECHNO GMBH & Co. KGDepartment of Preventive Medicine and Public Health, Universidad Autónoma de MadridCIBER of Epidemiology and Public HealthHealth Metrics and Measurement, World Health OrganizationDepartment of Nutrition and Dietetics, School of Health Science and Education, Harokopio UniversitySchool of Medicine, University of AthensDepartment of Nutrition and Dietetics, School of Health Science and Education, Harokopio UniversityAbstract Background The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE. Methods Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001–02 and followed-up in 2011–12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers. Results Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results. Conclusions The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.http://link.springer.com/article/10.1186/s12874-018-0644-1Cardiovascular diseaseRisk predictionMachine learningModel performance
collection DOAJ
language English
format Article
sources DOAJ
author Alexandros C. Dimopoulos
Mara Nikolaidou
Francisco Félix Caballero
Worrawat Engchuan
Albert Sanchez-Niubo
Holger Arndt
José Luis Ayuso-Mateos
Josep Maria Haro
Somnath Chatterji
Ekavi N. Georgousopoulou
Christos Pitsavos
Demosthenes B. Panagiotakos
spellingShingle Alexandros C. Dimopoulos
Mara Nikolaidou
Francisco Félix Caballero
Worrawat Engchuan
Albert Sanchez-Niubo
Holger Arndt
José Luis Ayuso-Mateos
Josep Maria Haro
Somnath Chatterji
Ekavi N. Georgousopoulou
Christos Pitsavos
Demosthenes B. Panagiotakos
Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
BMC Medical Research Methodology
Cardiovascular disease
Risk prediction
Machine learning
Model performance
author_facet Alexandros C. Dimopoulos
Mara Nikolaidou
Francisco Félix Caballero
Worrawat Engchuan
Albert Sanchez-Niubo
Holger Arndt
José Luis Ayuso-Mateos
Josep Maria Haro
Somnath Chatterji
Ekavi N. Georgousopoulou
Christos Pitsavos
Demosthenes B. Panagiotakos
author_sort Alexandros C. Dimopoulos
title Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
title_short Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
title_full Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
title_fullStr Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
title_full_unstemmed Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
title_sort machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2018-12-01
description Abstract Background The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE. Methods Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001–02 and followed-up in 2011–12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers. Results Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results. Conclusions The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.
topic Cardiovascular disease
Risk prediction
Machine learning
Model performance
url http://link.springer.com/article/10.1186/s12874-018-0644-1
work_keys_str_mv AT alexandroscdimopoulos machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT maranikolaidou machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT franciscofelixcaballero machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT worrawatengchuan machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT albertsanchezniubo machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT holgerarndt machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT joseluisayusomateos machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT josepmariaharo machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT somnathchatterji machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT ekavingeorgousopoulou machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT christospitsavos machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
AT demosthenesbpanagiotakos machinelearningmethodologiesversuscardiovascularriskscoresinpredictingdiseaserisk
_version_ 1725304339603390464