Unraveling the dynamic importance of county-level features in trajectory of COVID-19

Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread u...

Full description

Bibliographic Details
Main Authors: Qingchun Li, Yang Yang, Wanqiu Wang, Sanghyeon Lee, Xin Xiao, Xinyu Gao, Bora Oztekin, Chao Fan, Ali Mostafavi
Format: Article
Language:English
Published: Nature Publishing Group 2021-06-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-92634-w
id doaj-ef266603da604585805112e255324a7e
record_format Article
spelling doaj-ef266603da604585805112e255324a7e2021-06-27T11:34:39ZengNature Publishing GroupScientific Reports2045-23222021-06-0111111110.1038/s41598-021-92634-wUnraveling the dynamic importance of county-level features in trajectory of COVID-19Qingchun Li0Yang Yang1Wanqiu Wang2Sanghyeon Lee3Xin Xiao4Xinyu Gao5Bora Oztekin6Chao Fan7Ali Mostafavi8Zachry Department of Civil and Environmental Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityDepartment of Computer Science and Engineering, Texas A&M UniversityZachry Department of Civil and Environmental Engineering, Texas A&M UniversityAbstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle.https://doi.org/10.1038/s41598-021-92634-w
collection DOAJ
language English
format Article
sources DOAJ
author Qingchun Li
Yang Yang
Wanqiu Wang
Sanghyeon Lee
Xin Xiao
Xinyu Gao
Bora Oztekin
Chao Fan
Ali Mostafavi
spellingShingle Qingchun Li
Yang Yang
Wanqiu Wang
Sanghyeon Lee
Xin Xiao
Xinyu Gao
Bora Oztekin
Chao Fan
Ali Mostafavi
Unraveling the dynamic importance of county-level features in trajectory of COVID-19
Scientific Reports
author_facet Qingchun Li
Yang Yang
Wanqiu Wang
Sanghyeon Lee
Xin Xiao
Xinyu Gao
Bora Oztekin
Chao Fan
Ali Mostafavi
author_sort Qingchun Li
title Unraveling the dynamic importance of county-level features in trajectory of COVID-19
title_short Unraveling the dynamic importance of county-level features in trajectory of COVID-19
title_full Unraveling the dynamic importance of county-level features in trajectory of COVID-19
title_fullStr Unraveling the dynamic importance of county-level features in trajectory of COVID-19
title_full_unstemmed Unraveling the dynamic importance of county-level features in trajectory of COVID-19
title_sort unraveling the dynamic importance of county-level features in trajectory of covid-19
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-06-01
description Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle.
url https://doi.org/10.1038/s41598-021-92634-w
work_keys_str_mv AT qingchunli unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT yangyang unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT wanqiuwang unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT sanghyeonlee unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT xinxiao unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT xinyugao unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT boraoztekin unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT chaofan unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
AT alimostafavi unravelingthedynamicimportanceofcountylevelfeaturesintrajectoryofcovid19
_version_ 1721357710054653952