Tracking the outbreak of diseases Using Twitter : A Machine Learning Approach

In this project I have investigated the correlation between talks of illness on Twitter and the amount of calls to the Swedish medical information services (Sjukvårdsupplysningen). The project has only considered tweets located in Sweden and written in Swedish. In order to fulfill the aim of the pro...

Full description

Bibliographic Details
Main Author: Bohlin, Erik
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2012
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-180183
Description
Summary:In this project I have investigated the correlation between talks of illness on Twitter and the amount of calls to the Swedish medical information services (Sjukvårdsupplysningen). The project has only considered tweets located in Sweden and written in Swedish. In order to fulfill the aim of the project I used a SVM-classifier trained on 20,000 tweets manually marked as indicating sickness or not indicative of sickness. The resulting classifier was then used on roughly half a million tweets collected during the spring of 2012. The results were correlated with data from the Swedish medical information services. I was able to show a Pearson correlation of 0.8707051, p = 0.00225 when compared with weekly values from the medical information services. I also use an ets-model fitted to the twitter data to try to predict future values. However I have not been able to evaluate the accuracy of these predictions.