Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach

In this project I have investigated the correlation between talks of illness on Twitter and the amount of calls to the Swedish medical information services (Sjukvårdsupplysningen). The project has only considered tweets located in Sweden and written in Swedish. In order to fulfill the aim of the pro...

Full description

Bibliographic Details
Main Author: Bohlin, Erik
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2012
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-180183
id ndltd-UPSALLA1-oai-DiVA.org-uu-180183
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-1801832013-01-08T13:52:44ZTracking the outbreak of diseases Using Twitter : A Machine Learning ApproachengBohlin, ErikUppsala universitet, Institutionen för informationsteknologi2012In this project I have investigated the correlation between talks of illness on Twitter and the amount of calls to the Swedish medical information services (Sjukvårdsupplysningen). The project has only considered tweets located in Sweden and written in Swedish. In order to fulfill the aim of the project I used a SVM-classifier trained on 20,000 tweets manually marked as indicating sickness or not indicative of sickness. The resulting classifier was then used on roughly half a million tweets collected during the spring of 2012. The results were correlated with data from the Swedish medical information services. I was able to show a Pearson correlation of 0.8707051, p = 0.00225 when compared with weekly values from the medical information services. I also use an ets-model fitted to the twitter data to try to predict future values. However I have not been able to evaluate the accuracy of these predictions. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-180183UPTEC IT, 1401-5749 ; 12 014application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
description In this project I have investigated the correlation between talks of illness on Twitter and the amount of calls to the Swedish medical information services (Sjukvårdsupplysningen). The project has only considered tweets located in Sweden and written in Swedish. In order to fulfill the aim of the project I used a SVM-classifier trained on 20,000 tweets manually marked as indicating sickness or not indicative of sickness. The resulting classifier was then used on roughly half a million tweets collected during the spring of 2012. The results were correlated with data from the Swedish medical information services. I was able to show a Pearson correlation of 0.8707051, p = 0.00225 when compared with weekly values from the medical information services. I also use an ets-model fitted to the twitter data to try to predict future values. However I have not been able to evaluate the accuracy of these predictions.
author Bohlin, Erik
spellingShingle Bohlin, Erik
Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
author_facet Bohlin, Erik
author_sort Bohlin, Erik
title Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
title_short Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
title_full Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
title_fullStr Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
title_full_unstemmed Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach
title_sort tracking the outbreak of diseases using twitter : a machine learning approach
publisher Uppsala universitet, Institutionen för informationsteknologi
publishDate 2012
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-180183
work_keys_str_mv AT bohlinerik trackingtheoutbreakofdiseasesusingtwitteramachinelearningapproach
_version_ 1716531714010906624