Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study
BackgroundHealth authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as wel...
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2021-05-01
|
Series: | Journal of Medical Internet Research |
Online Access: | https://www.jmir.org/2021/5/e27059 |
id |
doaj-1d3f6978fb6e446c8ec578a38303c4d6 |
---|---|
record_format |
Article |
spelling |
doaj-1d3f6978fb6e446c8ec578a38303c4d62021-05-25T14:46:11ZengJMIR PublicationsJournal of Medical Internet Research1438-88712021-05-01235e2705910.2196/27059Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology StudyDaughton, Ashlynn RShelley, Courtney DBarnard, MarthaGerts, DaxWatson Ross, ChrysmCrooker, IsabelNadiga, GopalMukundan, NileshVaquera Chavez, Nidia YadiraParikh, NidhiPitts, TravisFairchild, Geoffrey BackgroundHealth authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging, is also important for minimizing long-term effects of an outbreak. ObjectiveWe used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications. MethodsWe developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We used this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that performed adequately were applied to our remaining corpus, and temporal and geospatial trends were assessed. We compared the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here. ResultsWe applied our labeling schema to approximately 7200 tweets. The worst-performing classifiers had F1 scores of only 0.18 to 0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, were much stronger, with F1 scores of 0.64 to 0.66. We applied the social distancing classifiers to over 228 million tweets. We showed temporal patterns consistent with real-world events, and we showed correlations of up to –0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States. ConclusionsBehaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as for informing public health communication strategies by describing awareness of and compliance with suggested behaviors.https://www.jmir.org/2021/5/e27059 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Daughton, Ashlynn R Shelley, Courtney D Barnard, Martha Gerts, Dax Watson Ross, Chrysm Crooker, Isabel Nadiga, Gopal Mukundan, Nilesh Vaquera Chavez, Nidia Yadira Parikh, Nidhi Pitts, Travis Fairchild, Geoffrey |
spellingShingle |
Daughton, Ashlynn R Shelley, Courtney D Barnard, Martha Gerts, Dax Watson Ross, Chrysm Crooker, Isabel Nadiga, Gopal Mukundan, Nilesh Vaquera Chavez, Nidia Yadira Parikh, Nidhi Pitts, Travis Fairchild, Geoffrey Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study Journal of Medical Internet Research |
author_facet |
Daughton, Ashlynn R Shelley, Courtney D Barnard, Martha Gerts, Dax Watson Ross, Chrysm Crooker, Isabel Nadiga, Gopal Mukundan, Nilesh Vaquera Chavez, Nidia Yadira Parikh, Nidhi Pitts, Travis Fairchild, Geoffrey |
author_sort |
Daughton, Ashlynn R |
title |
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study |
title_short |
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study |
title_full |
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study |
title_fullStr |
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study |
title_full_unstemmed |
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study |
title_sort |
mining and validating social media data for covid-19–related human behaviors between january and july 2020: infodemiology study |
publisher |
JMIR Publications |
series |
Journal of Medical Internet Research |
issn |
1438-8871 |
publishDate |
2021-05-01 |
description |
BackgroundHealth authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging, is also important for minimizing long-term effects of an outbreak.
ObjectiveWe used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications.
MethodsWe developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We used this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that performed adequately were applied to our remaining corpus, and temporal and geospatial trends were assessed. We compared the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here.
ResultsWe applied our labeling schema to approximately 7200 tweets. The worst-performing classifiers had F1 scores of only 0.18 to 0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, were much stronger, with F1 scores of 0.64 to 0.66. We applied the social distancing classifiers to over 228 million tweets. We showed temporal patterns consistent with real-world events, and we showed correlations of up to –0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States.
ConclusionsBehaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as for informing public health communication strategies by describing awareness of and compliance with suggested behaviors. |
url |
https://www.jmir.org/2021/5/e27059 |
work_keys_str_mv |
AT daughtonashlynnr miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT shelleycourtneyd miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT barnardmartha miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT gertsdax miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT watsonrosschrysm miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT crookerisabel miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT nadigagopal miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT mukundannilesh miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT vaquerachaveznidiayadira miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT parikhnidhi miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT pittstravis miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy AT fairchildgeoffrey miningandvalidatingsocialmediadataforcovid19relatedhumanbehaviorsbetweenjanuaryandjuly2020infodemiologystudy |
_version_ |
1721427139756032000 |