Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk

Bibliographic Details
Main Author: Mainzer, Jacob Emil
Language:English
Published: The Ohio State University / OhioLINK 2011
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732
id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1322708732
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu13227087322021-08-03T06:04:18Z Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk Mainzer, Jacob Emil Computer Science Supervised learning algorithms often require large amounts of labeled data. Creating this data can be time consuming and expensive. Recent work has used untrained annotators on Mechanical Turk to quickly and cheaply create data for NLP tasks, such as word sense disambiguation, word similarity, machine translation, and PP attachment. In this experiment, we test whether untrained annotators can accurately perform the task of POS tagging. We design a Java Applet, called the Interactive Tagging Guide (ITG) to assist untrained annotators in accurately and quickly POS tagging words using the Penn Treebank tagset. We test this Applet on a small corpus using Mechanical Turk, an online marketplace where users earn small payments for the completion of short tasks. Our results demonstrate that, given the proper assistance, untrained annotators are able to tag parts of speech with approximately 90% accuracy. Furthermore, we analyze the performance of expert annotators using the ITG and discover nearly identical levels of performance as compared to the untrained annotators. 2011 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Mainzer, Jacob Emil
Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
author Mainzer, Jacob Emil
author_facet Mainzer, Jacob Emil
author_sort Mainzer, Jacob Emil
title Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
title_short Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
title_full Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
title_fullStr Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
title_full_unstemmed Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk
title_sort labeling parts of speech using untrained annotators on mechanical turk
publisher The Ohio State University / OhioLINK
publishDate 2011
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1322708732
work_keys_str_mv AT mainzerjacobemil labelingpartsofspeechusinguntrainedannotatorsonmechanicalturk
_version_ 1719430377311830016