Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.

In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases p...

Full description

Bibliographic Details
Main Authors: Jordan J Bird, Chloe M Barnes, Cristiano Premebida, Anikó Ekárt, Diego R Faria
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0241332
id doaj-c1c7b1eec8a645519c917c44ccd0298c
record_format Article
spelling doaj-c1c7b1eec8a645519c917c44ccd0298c2021-03-04T11:53:02ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-011510e024133210.1371/journal.pone.0241332Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.Jordan J BirdChloe M BarnesCristiano PremebidaAnikó EkártDiego R FariaIn this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as 'low', 'medium-low', 'medium-high', and 'high'. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding.https://doi.org/10.1371/journal.pone.0241332
collection DOAJ
language English
format Article
sources DOAJ
author Jordan J Bird
Chloe M Barnes
Cristiano Premebida
Anikó Ekárt
Diego R Faria
spellingShingle Jordan J Bird
Chloe M Barnes
Cristiano Premebida
Anikó Ekárt
Diego R Faria
Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
PLoS ONE
author_facet Jordan J Bird
Chloe M Barnes
Cristiano Premebida
Anikó Ekárt
Diego R Faria
author_sort Jordan J Bird
title Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
title_short Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
title_full Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
title_fullStr Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
title_full_unstemmed Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach.
title_sort country-level pandemic risk and preparedness classification based on covid-19 data: a machine learning approach.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as 'low', 'medium-low', 'medium-high', and 'high'. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding.
url https://doi.org/10.1371/journal.pone.0241332
work_keys_str_mv AT jordanjbird countrylevelpandemicriskandpreparednessclassificationbasedoncovid19dataamachinelearningapproach
AT chloembarnes countrylevelpandemicriskandpreparednessclassificationbasedoncovid19dataamachinelearningapproach
AT cristianopremebida countrylevelpandemicriskandpreparednessclassificationbasedoncovid19dataamachinelearningapproach
AT anikoekart countrylevelpandemicriskandpreparednessclassificationbasedoncovid19dataamachinelearningapproach
AT diegorfaria countrylevelpandemicriskandpreparednessclassificationbasedoncovid19dataamachinelearningapproach
_version_ 1714803367477248000