Using Amazon Mechanical Turk for linguistic research
Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Drustvo Psihologa Srbije
2010-01-01
|
Series: | Psihologija |
Subjects: | |
Online Access: | http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf |
id |
doaj-8556f9dffec0475b955b07dcebd91ff9 |
---|---|
record_format |
Article |
spelling |
doaj-8556f9dffec0475b955b07dcebd91ff92020-11-25T02:40:43ZengDrustvo Psihologa SrbijePsihologija0048-57052010-01-0143444146410.2298/PSI1004441SUsing Amazon Mechanical Turk for linguistic researchSchnoebelen TylerKuperman VictorAmazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdfcrowdsourcingAmazon Mechanical Turkweb experimentspredictabilitysemantic similarity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Schnoebelen Tyler Kuperman Victor |
spellingShingle |
Schnoebelen Tyler Kuperman Victor Using Amazon Mechanical Turk for linguistic research Psihologija crowdsourcing Amazon Mechanical Turk web experiments predictability semantic similarity |
author_facet |
Schnoebelen Tyler Kuperman Victor |
author_sort |
Schnoebelen Tyler |
title |
Using Amazon Mechanical Turk for linguistic research |
title_short |
Using Amazon Mechanical Turk for linguistic research |
title_full |
Using Amazon Mechanical Turk for linguistic research |
title_fullStr |
Using Amazon Mechanical Turk for linguistic research |
title_full_unstemmed |
Using Amazon Mechanical Turk for linguistic research |
title_sort |
using amazon mechanical turk for linguistic research |
publisher |
Drustvo Psihologa Srbije |
series |
Psihologija |
issn |
0048-5705 |
publishDate |
2010-01-01 |
description |
Amazon’s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is “lift” part of “lift up”) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing. |
topic |
crowdsourcing Amazon Mechanical Turk web experiments predictability semantic similarity |
url |
http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf |
work_keys_str_mv |
AT schnoebelentyler usingamazonmechanicalturkforlinguisticresearch AT kupermanvictor usingamazonmechanicalturkforlinguisticresearch |
_version_ |
1724780115622100992 |