Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2011
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1322708732 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu13227087322021-08-03T06:04:18Z Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk Mainzer, Jacob Emil Computer Science Supervised learning algorithms often require large amounts of labeled data. Creating this data can be time consuming and expensive. Recent work has used untrained annotators on Mechanical Turk to quickly and cheaply create data for NLP tasks, such as word sense disambiguation, word similarity, machine translation, and PP attachment. In this experiment, we test whether untrained annotators can accurately perform the task of POS tagging. We design a Java Applet, called the Interactive Tagging Guide (ITG) to assist untrained annotators in accurately and quickly POS tagging words using the Penn Treebank tagset. We test this Applet on a small corpus using Mechanical Turk, an online marketplace where users earn small payments for the completion of short tasks. Our results demonstrate that, given the proper assistance, untrained annotators are able to tag parts of speech with approximately 90% accuracy. Furthermore, we analyze the performance of expert annotators using the ITG and discover nearly identical levels of performance as compared to the untrained annotators. 2011 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science |
spellingShingle |
Computer Science Mainzer, Jacob Emil Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
author |
Mainzer, Jacob Emil |
author_facet |
Mainzer, Jacob Emil |
author_sort |
Mainzer, Jacob Emil |
title |
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
title_short |
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
title_full |
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
title_fullStr |
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
title_full_unstemmed |
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk |
title_sort |
labeling parts of speech using untrained annotators on mechanical turk |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2011 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 |
work_keys_str_mv |
AT mainzerjacobemil labelingpartsofspeechusinguntrainedannotatorsonmechanicalturk |
_version_ |
1719430377311830016 |