Using Amazon Mechanical Turk for linguistic research

Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One...

Full description

Bibliographic Details
Main Authors: Schnoebelen Tyler, Kuperman Victor
Format: Article
Language:English
Published: Drustvo Psihologa Srbije 2010-01-01
Series:Psihologija
Subjects:
Online Access:http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf
id doaj-8556f9dffec0475b955b07dcebd91ff9
record_format Article
spelling doaj-8556f9dffec0475b955b07dcebd91ff92020-11-25T02:40:43ZengDrustvo Psihologa SrbijePsihologija0048-57052010-01-0143444146410.2298/PSI1004441SUsing Amazon Mechanical Turk for linguistic researchSchnoebelen TylerKuperman VictorAmazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdfcrowdsourcingAmazon Mechanical Turkweb experimentspredictabilitysemantic similarity
collection DOAJ
language English
format Article
sources DOAJ
author Schnoebelen Tyler
Kuperman Victor
spellingShingle Schnoebelen Tyler
Kuperman Victor
Using Amazon Mechanical Turk for linguistic research
Psihologija
crowdsourcing
Amazon Mechanical Turk
web experiments
predictability
semantic similarity
author_facet Schnoebelen Tyler
Kuperman Victor
author_sort Schnoebelen Tyler
title Using Amazon Mechanical Turk for linguistic research
title_short Using Amazon Mechanical Turk for linguistic research
title_full Using Amazon Mechanical Turk for linguistic research
title_fullStr Using Amazon Mechanical Turk for linguistic research
title_full_unstemmed Using Amazon Mechanical Turk for linguistic research
title_sort using amazon mechanical turk for linguistic research
publisher Drustvo Psihologa Srbije
series Psihologija
issn 0048-5705
publishDate 2010-01-01
description Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.
topic crowdsourcing
Amazon Mechanical Turk
web experiments
predictability
semantic similarity
url http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf
work_keys_str_mv AT schnoebelentyler usingamazonmechanicalturkforlinguisticresearch
AT kupermanvictor usingamazonmechanicalturkforlinguisticresearch
_version_ 1724780115622100992