Using Amazon Mechanical Turk for linguistic research

Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One...

Full description

Bibliographic Details
Main Authors:	Schnoebelen Tyler, Kuperman Victor
Format:	Article
Language:	English
Published:	Drustvo Psihologa Srbije 2010-01-01
Series:	Psihologija
Subjects:	crowdsourcing Amazon Mechanical Turk web experiments predictability semantic similarity
Online Access:	http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf

id	doaj-8556f9dffec0475b955b07dcebd91ff9
record_format	Article
spelling	doaj-8556f9dffec0475b955b07dcebd91ff92020-11-25T02:40:43ZengDrustvo Psihologa SrbijePsihologija0048-57052010-01-0143444146410.2298/PSI1004441SUsing Amazon Mechanical Turk for linguistic researchSchnoebelen TylerKuperman VictorAmazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdfcrowdsourcingAmazon Mechanical Turkweb experimentspredictabilitysemantic similarity
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Schnoebelen Tyler Kuperman Victor
spellingShingle	Schnoebelen Tyler Kuperman Victor Using Amazon Mechanical Turk for linguistic research Psihologija crowdsourcing Amazon Mechanical Turk web experiments predictability semantic similarity
author_facet	Schnoebelen Tyler Kuperman Victor
author_sort	Schnoebelen Tyler
title	Using Amazon Mechanical Turk for linguistic research
title_short	Using Amazon Mechanical Turk for linguistic research
title_full	Using Amazon Mechanical Turk for linguistic research
title_fullStr	Using Amazon Mechanical Turk for linguistic research
title_full_unstemmed	Using Amazon Mechanical Turk for linguistic research
title_sort	using amazon mechanical turk for linguistic research
publisher	Drustvo Psihologa Srbije
series	Psihologija
issn	0048-5705
publishDate	2010-01-01
description	Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.
topic	crowdsourcing Amazon Mechanical Turk web experiments predictability semantic similarity
url	http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf
work_keys_str_mv	AT schnoebelentyler usingamazonmechanicalturkforlinguisticresearch AT kupermanvictor usingamazonmechanicalturkforlinguisticresearch
_version_	1724780115622100992

Using Amazon Mechanical Turk for linguistic research

Similar Items