Annotated Corpus for Citation Context Analysis

In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. W...

Full description

Bibliographic Details
Main Authors: Myriam Hernández-Álvarez, José Gómez Soriano, Patricio Martínez-Barco
Format: Article
Language:English
Published: Escuela Politécnica Nacional (EPN) 2016-05-01
Series:Latin-American Journal of Computing
Subjects:
Online Access:http://lajc.epn.edu.ec/index.php/LAJC/article/view/102
id doaj-974366b84628414bb4de51dd0833e4b7
record_format Article
spelling doaj-974366b84628414bb4de51dd0833e4b72020-11-25T03:17:11ZengEscuela Politécnica Nacional (EPN)Latin-American Journal of Computing1390-92661390-91342016-05-01313542Annotated Corpus for Citation Context AnalysisMyriam Hernández-Álvarez0José Gómez Soriano1Patricio Martínez-Barco2Escuela Politécnica NacionalUniversidad de AlicanteUniversidad de AlicanteIn this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions.http://lajc.epn.edu.ec/index.php/LAJC/article/view/102Corpusannotationmethodologymachine- learningfunctionpolarityaspectsschemakeywordslabelsclassification
collection DOAJ
language English
format Article
sources DOAJ
author Myriam Hernández-Álvarez
José Gómez Soriano
Patricio Martínez-Barco
spellingShingle Myriam Hernández-Álvarez
José Gómez Soriano
Patricio Martínez-Barco
Annotated Corpus for Citation Context Analysis
Latin-American Journal of Computing
Corpus
annotation
methodology
machine- learning
function
polarity
aspects
schema
keywords
labels
classification
author_facet Myriam Hernández-Álvarez
José Gómez Soriano
Patricio Martínez-Barco
author_sort Myriam Hernández-Álvarez
title Annotated Corpus for Citation Context Analysis
title_short Annotated Corpus for Citation Context Analysis
title_full Annotated Corpus for Citation Context Analysis
title_fullStr Annotated Corpus for Citation Context Analysis
title_full_unstemmed Annotated Corpus for Citation Context Analysis
title_sort annotated corpus for citation context analysis
publisher Escuela Politécnica Nacional (EPN)
series Latin-American Journal of Computing
issn 1390-9266
1390-9134
publishDate 2016-05-01
description In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions.
topic Corpus
annotation
methodology
machine- learning
function
polarity
aspects
schema
keywords
labels
classification
url http://lajc.epn.edu.ec/index.php/LAJC/article/view/102
work_keys_str_mv AT myriamhernandezalvarez annotatedcorpusforcitationcontextanalysis
AT josegomezsoriano annotatedcorpusforcitationcontextanalysis
AT patriciomartinezbarco annotatedcorpusforcitationcontextanalysis
_version_ 1724632870510657536