Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotati...

Full description

Bibliographic Details
Main Authors: Sebastian Haunss, Jonas Kuhn, Sebastian Padó, Andre Blessing, Nico Blokker, Erenay Dayanik, Gabriella Lapesa
Format: Article
Language:English
Published: Cogitatio 2020-06-01
Series:Politics and Governance
Subjects:
Online Access:https://www.cogitatiopress.com/politicsandgovernance/article/view/2591
id doaj-4f1e6e48f02240dca1342170400447da
record_format Article
spelling doaj-4f1e6e48f02240dca1342170400447da2020-11-25T03:22:06ZengCogitatioPolitics and Governance2183-24632020-06-018232633910.17645/pag.v8i2.25911461Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data SetsSebastian Haunss0Jonas Kuhn1Sebastian Padó2Andre Blessing3Nico Blokker4Erenay Dayanik5Gabriella Lapesa6Research Center on Inequality and Social Policy, University of Bremen, GermanyInstitute for Natural Language Processing, University of Stuttgart, GermanyInstitute for Natural Language Processing, University of Stuttgart, GermanyInstitute for Natural Language Processing, University of Stuttgart, GermanyResearch Center on Inequality and Social Policy, University of Bremen, GermanyInstitute for Natural Language Processing, University of Stuttgart, GermanyInstitute for Natural Language Processing, University of Stuttgart, GermanyThis article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.https://www.cogitatiopress.com/politicsandgovernance/article/view/2591annotationautomationdiscourse networksmachine learningmigration discourse
collection DOAJ
language English
format Article
sources DOAJ
author Sebastian Haunss
Jonas Kuhn
Sebastian Padó
Andre Blessing
Nico Blokker
Erenay Dayanik
Gabriella Lapesa
spellingShingle Sebastian Haunss
Jonas Kuhn
Sebastian Padó
Andre Blessing
Nico Blokker
Erenay Dayanik
Gabriella Lapesa
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
Politics and Governance
annotation
automation
discourse networks
machine learning
migration discourse
author_facet Sebastian Haunss
Jonas Kuhn
Sebastian Padó
Andre Blessing
Nico Blokker
Erenay Dayanik
Gabriella Lapesa
author_sort Sebastian Haunss
title Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_short Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_full Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_fullStr Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_full_unstemmed Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_sort integrating manual and automatic annotation for the creation of discourse network data sets
publisher Cogitatio
series Politics and Governance
issn 2183-2463
publishDate 2020-06-01
description This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.
topic annotation
automation
discourse networks
machine learning
migration discourse
url https://www.cogitatiopress.com/politicsandgovernance/article/view/2591
work_keys_str_mv AT sebastianhaunss integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT jonaskuhn integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT sebastianpado integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT andreblessing integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT nicoblokker integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT erenaydayanik integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
AT gabriellalapesa integratingmanualandautomaticannotationforthecreationofdiscoursenetworkdatasets
_version_ 1724611271622393856