Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media

Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. We mined data from several public Twitter endpoints to identify content relevant to healthcare providers and public health regulatory professionals. We began by co...

Full description

Bibliographic Details
Main Author: Clark, Eric Michael
Format: Others
Language:en
Published: ScholarWorks @ UVM 2019
Subjects:
Online Access:https://scholarworks.uvm.edu/graddis/1006
https://scholarworks.uvm.edu/cgi/viewcontent.cgi?article=2006&context=graddis
id ndltd-uvm.edu-oai-scholarworks.uvm.edu-graddis-2006
record_format oai_dc
collection NDLTD
language en
format Others
sources NDLTD
topic Computational Linguistics
Data Science
Machine Learning
Public Health Monitoring
Sentiment Analysis
Social Media
Computer Sciences
Social and Behavioral Sciences
spellingShingle Computational Linguistics
Data Science
Machine Learning
Public Health Monitoring
Sentiment Analysis
Social Media
Computer Sciences
Social and Behavioral Sciences
Clark, Eric Michael
Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
description Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. We mined data from several public Twitter endpoints to identify content relevant to healthcare providers and public health regulatory professionals. We began by compiling content related to electronic nicotine delivery systems (or e-cigarettes) as these had become popular alternatives to tobacco products. There was an apparent need to remove high frequency tweeting entities, called bots, that would spam messages, advertisements, and fabricate testimonials. Algorithms were constructed using natural language processing and machine learning to sift human responses from automated accounts with high degrees of accuracy. We found the average hyperlink per tweet, the average character dissimilarity between each individual's content, as well as the rate of introduction of unique words were valuable attributes in identifying automated accounts. We performed a 10-fold Cross Validation and measured performance of each set of tweet features, at various bin sizes, the best of which performed with 97% accuracy. These methods were used to isolate automated content related to the advertising of electronic cigarettes. A rich taxonomy of automated entities, including robots, cyborgs, and spammers, each with different measurable linguistic features were categorized. Electronic cigarette related posts were classified as automated or organic and content was investigated with a hedonometric sentiment analysis. The overwhelming majority (≈ 80%) were automated, many of which were commercial in nature. Others used false testimonials that were sent directly to individuals as a personalized form of targeted marketing. Many tweets advertised nicotine vaporizer fluid (or e-liquid) in various “kid-friendly” flavors including 'Fudge Brownie', 'Hot Chocolate', 'Circus Cotton Candy' along with every imaginable flavor of fruit, which were long ago banned for traditional tobacco products. Others offered free trials, as well as incentives to retweet and spread the post among their own network. Free prize giveaways were also hosted whose raffle tickets were issued for sharing their tweet. Due to the large youth presence on the public social media platform, this was evidence that the marketing of electronic cigarettes needed considerable regulation. Twitter has since officially banned all electronic cigarette advertising on their platform. Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. We have studied several active cancer patient populations, discussing their experiences with the disease as well as survivor-ship. We experimented with a Convolutional Neural Network (CNN) as well as logistic regression to classify tweets as patient related. This led to a sample of 845 breast cancer survivor accounts to study, over 16 months. We found positive sentiments regarding patient treatment, raising support, and spreading awareness. A large portion of negative sentiments were shared regarding political legislation that could result in loss of coverage of their healthcare. We refer to these online public testimonies as “Invisible Patient Reported Outcomes” (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-reporting. Our methods can be readily applied interdisciplinary to obtain insights into a particular group of public opinions. Capturing iPROs and public sentiments from online communication can help inform healthcare professionals and regulators, leading to more connected and personalized treatment regimens. Social listening can provide valuable insights into public health surveillance strategies.
author Clark, Eric Michael
author_facet Clark, Eric Michael
author_sort Clark, Eric Michael
title Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
title_short Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
title_full Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
title_fullStr Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
title_full_unstemmed Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media
title_sort applications in sentiment analysis and machine learning for identifying public health variables across social media
publisher ScholarWorks @ UVM
publishDate 2019
url https://scholarworks.uvm.edu/graddis/1006
https://scholarworks.uvm.edu/cgi/viewcontent.cgi?article=2006&context=graddis
work_keys_str_mv AT clarkericmichael applicationsinsentimentanalysisandmachinelearningforidentifyingpublichealthvariablesacrosssocialmedia
_version_ 1719272636837527552
spelling ndltd-uvm.edu-oai-scholarworks.uvm.edu-graddis-20062019-10-20T11:30:14Z Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media Clark, Eric Michael Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. We mined data from several public Twitter endpoints to identify content relevant to healthcare providers and public health regulatory professionals. We began by compiling content related to electronic nicotine delivery systems (or e-cigarettes) as these had become popular alternatives to tobacco products. There was an apparent need to remove high frequency tweeting entities, called bots, that would spam messages, advertisements, and fabricate testimonials. Algorithms were constructed using natural language processing and machine learning to sift human responses from automated accounts with high degrees of accuracy. We found the average hyperlink per tweet, the average character dissimilarity between each individual's content, as well as the rate of introduction of unique words were valuable attributes in identifying automated accounts. We performed a 10-fold Cross Validation and measured performance of each set of tweet features, at various bin sizes, the best of which performed with 97% accuracy. These methods were used to isolate automated content related to the advertising of electronic cigarettes. A rich taxonomy of automated entities, including robots, cyborgs, and spammers, each with different measurable linguistic features were categorized. Electronic cigarette related posts were classified as automated or organic and content was investigated with a hedonometric sentiment analysis. The overwhelming majority (≈ 80%) were automated, many of which were commercial in nature. Others used false testimonials that were sent directly to individuals as a personalized form of targeted marketing. Many tweets advertised nicotine vaporizer fluid (or e-liquid) in various “kid-friendly” flavors including 'Fudge Brownie', 'Hot Chocolate', 'Circus Cotton Candy' along with every imaginable flavor of fruit, which were long ago banned for traditional tobacco products. Others offered free trials, as well as incentives to retweet and spread the post among their own network. Free prize giveaways were also hosted whose raffle tickets were issued for sharing their tweet. Due to the large youth presence on the public social media platform, this was evidence that the marketing of electronic cigarettes needed considerable regulation. Twitter has since officially banned all electronic cigarette advertising on their platform. Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. We have studied several active cancer patient populations, discussing their experiences with the disease as well as survivor-ship. We experimented with a Convolutional Neural Network (CNN) as well as logistic regression to classify tweets as patient related. This led to a sample of 845 breast cancer survivor accounts to study, over 16 months. We found positive sentiments regarding patient treatment, raising support, and spreading awareness. A large portion of negative sentiments were shared regarding political legislation that could result in loss of coverage of their healthcare. We refer to these online public testimonies as “Invisible Patient Reported Outcomes” (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-reporting. Our methods can be readily applied interdisciplinary to obtain insights into a particular group of public opinions. Capturing iPROs and public sentiments from online communication can help inform healthcare professionals and regulators, leading to more connected and personalized treatment regimens. Social listening can provide valuable insights into public health surveillance strategies. 2019-01-01T08:00:00Z text application/pdf https://scholarworks.uvm.edu/graddis/1006 https://scholarworks.uvm.edu/cgi/viewcontent.cgi?article=2006&context=graddis Graduate College Dissertations and Theses en ScholarWorks @ UVM Computational Linguistics Data Science Machine Learning Public Health Monitoring Sentiment Analysis Social Media Computer Sciences Social and Behavioral Sciences