Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data

This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherland...

Full description

Bibliographic Details
Main Authors: Pedraza Pablo de, Visintin Stefano, Tijdens Kea, Kismihók Gábor
Format: Article
Language:English
Published: Sciendo 2019-09-01
Series:IZA Journal of Labor Economics
Subjects:
j23
j63
c22
c80
Online Access:https://doi.org/10.2478/izajole-2019-0004
id doaj-88e9610acd094b8a805d8c778e041fd8
record_format Article
spelling doaj-88e9610acd094b8a805d8c778e041fd82021-09-05T21:02:07ZengSciendoIZA Journal of Labor Economics2193-89972019-09-018110311610.2478/izajole-2019-0004izajole-2019-0004Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy DataPedraza Pablo de0Visintin Stefano1Tijdens Kea2Kismihók Gábor3University of Amsterdam and European Commission, Joint Research Centre (JRC), Unit I.1, Modelling, Indicators & Impact Evaluation, Via E. Fermi 2749, TP 361, Ispra (VA), I-21027, ItalyUniversity of Amsterdam/AIAS and Universidad Camilo José Cela, Facultad de Tecnología y Ciencia, Urb. Villafranca del Castillo, Calle Castillo de Alarcón, 49, 28692, Villanueva de la Cañada, Madrid, SpainUniversity of Amsterdam/AIAS, Postbus 94025, 1090 GAAmsterdam, The NetherlandsLeibniz Information Centre for Science and Technology, Welfengarten 1 B, 30167Hannover, GermanyThis paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.https://doi.org/10.2478/izajole-2019-0004web crawlingstatistical inferencetime seriesvacancieslabor demanddata collectionj23j63c22c80
collection DOAJ
language English
format Article
sources DOAJ
author Pedraza Pablo de
Visintin Stefano
Tijdens Kea
Kismihók Gábor
spellingShingle Pedraza Pablo de
Visintin Stefano
Tijdens Kea
Kismihók Gábor
Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
IZA Journal of Labor Economics
web crawling
statistical inference
time series
vacancies
labor demand
data collection
j23
j63
c22
c80
author_facet Pedraza Pablo de
Visintin Stefano
Tijdens Kea
Kismihók Gábor
author_sort Pedraza Pablo de
title Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
title_short Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
title_full Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
title_fullStr Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
title_full_unstemmed Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data
title_sort survey vs scraped data: comparing time series properties of web and survey vacancy data
publisher Sciendo
series IZA Journal of Labor Economics
issn 2193-8997
publishDate 2019-09-01
description This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.
topic web crawling
statistical inference
time series
vacancies
labor demand
data collection
j23
j63
c22
c80
url https://doi.org/10.2478/izajole-2019-0004
work_keys_str_mv AT pedrazapablode surveyvsscrapeddatacomparingtimeseriespropertiesofwebandsurveyvacancydata
AT visintinstefano surveyvsscrapeddatacomparingtimeseriespropertiesofwebandsurveyvacancydata
AT tijdenskea surveyvsscrapeddatacomparingtimeseriespropertiesofwebandsurveyvacancydata
AT kismihokgabor surveyvsscrapeddatacomparingtimeseriespropertiesofwebandsurveyvacancydata
_version_ 1717781269974089728