Comparing Methods to Collect and Geolocate Tweets in Great Britain

In the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy,...

Full description

Bibliographic Details
Main Authors: Stephan Schlosser, Daniele Toninelli, Michela Cameletti
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Journal of Open Innovation: Technology, Market and Complexity
Subjects:
Online Access:https://www.mdpi.com/2199-8531/7/1/44
id doaj-ad7dda3cd752411285e8230f905289b2
record_format Article
spelling doaj-ad7dda3cd752411285e8230f905289b22021-01-26T00:03:57ZengMDPI AGJournal of Open Innovation: Technology, Market and Complexity2199-85312021-01-017444410.3390/joitmc7010044Comparing Methods to Collect and Geolocate Tweets in Great BritainStephan Schlosser0Daniele Toninelli1Michela Cameletti2Center of Methods in Social Sciences, University of Göttingen, 37073 Göttingen, GermanyDepartment of Economics, University of Bergamo, 24127 Bergamo, ItalyDepartment of Economics, University of Bergamo, 24127 Bergamo, ItalyIn the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy, and reliability. In this framework, our paper aims at identifying an optimized and flexible method to collect and, at the same time, geolocate social media information over a whole country. In particular, the target of this paper is to compare three alternative methods to collect data from the social media Twitter. This is achieved considering four main comparison criteria: Collection time, dataset size, pre-processing phase load, and geographic distribution. Our findings regarding Great Britain identify one of these methods as the best option, since it is able to collect both the highest number of tweets per hour and the highest percentage of unique tweets per hour. Furthermore, this method reduces the computational effort needed to pre-process the collected tweets (e.g., showing the lowest collection times and the lowest number of duplicates within the geographical areas) and enhances the territorial coverage (if compared to the population distribution). At the same time, the effort required to set up this method is feasible and less prone to the arbitrary decisions of the researcher.https://www.mdpi.com/2199-8531/7/1/44Twittergeographical coveragesocial mediabig datageolocationspatial data collection
collection DOAJ
language English
format Article
sources DOAJ
author Stephan Schlosser
Daniele Toninelli
Michela Cameletti
spellingShingle Stephan Schlosser
Daniele Toninelli
Michela Cameletti
Comparing Methods to Collect and Geolocate Tweets in Great Britain
Journal of Open Innovation: Technology, Market and Complexity
Twitter
geographical coverage
social media
big data
geolocation
spatial data collection
author_facet Stephan Schlosser
Daniele Toninelli
Michela Cameletti
author_sort Stephan Schlosser
title Comparing Methods to Collect and Geolocate Tweets in Great Britain
title_short Comparing Methods to Collect and Geolocate Tweets in Great Britain
title_full Comparing Methods to Collect and Geolocate Tweets in Great Britain
title_fullStr Comparing Methods to Collect and Geolocate Tweets in Great Britain
title_full_unstemmed Comparing Methods to Collect and Geolocate Tweets in Great Britain
title_sort comparing methods to collect and geolocate tweets in great britain
publisher MDPI AG
series Journal of Open Innovation: Technology, Market and Complexity
issn 2199-8531
publishDate 2021-01-01
description In the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy, and reliability. In this framework, our paper aims at identifying an optimized and flexible method to collect and, at the same time, geolocate social media information over a whole country. In particular, the target of this paper is to compare three alternative methods to collect data from the social media Twitter. This is achieved considering four main comparison criteria: Collection time, dataset size, pre-processing phase load, and geographic distribution. Our findings regarding Great Britain identify one of these methods as the best option, since it is able to collect both the highest number of tweets per hour and the highest percentage of unique tweets per hour. Furthermore, this method reduces the computational effort needed to pre-process the collected tweets (e.g., showing the lowest collection times and the lowest number of duplicates within the geographical areas) and enhances the territorial coverage (if compared to the population distribution). At the same time, the effort required to set up this method is feasible and less prone to the arbitrary decisions of the researcher.
topic Twitter
geographical coverage
social media
big data
geolocation
spatial data collection
url https://www.mdpi.com/2199-8531/7/1/44
work_keys_str_mv AT stephanschlosser comparingmethodstocollectandgeolocatetweetsingreatbritain
AT danieletoninelli comparingmethodstocollectandgeolocatetweetsingreatbritain
AT michelacameletti comparingmethodstocollectandgeolocatetweetsingreatbritain
_version_ 1724323617285603328