Annotated Corpus for Citation Context Analysis
In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. W...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Escuela Politécnica Nacional (EPN)
2016-05-01
|
Series: | Latin-American Journal of Computing |
Subjects: | |
Online Access: | http://lajc.epn.edu.ec/index.php/LAJC/article/view/102 |
id |
doaj-974366b84628414bb4de51dd0833e4b7 |
---|---|
record_format |
Article |
spelling |
doaj-974366b84628414bb4de51dd0833e4b72020-11-25T03:17:11ZengEscuela Politécnica Nacional (EPN)Latin-American Journal of Computing1390-92661390-91342016-05-01313542Annotated Corpus for Citation Context AnalysisMyriam Hernández-Álvarez0José Gómez Soriano1Patricio Martínez-Barco2Escuela Politécnica NacionalUniversidad de AlicanteUniversidad de AlicanteIn this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions.http://lajc.epn.edu.ec/index.php/LAJC/article/view/102Corpusannotationmethodologymachine- learningfunctionpolarityaspectsschemakeywordslabelsclassification |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Myriam Hernández-Álvarez José Gómez Soriano Patricio Martínez-Barco |
spellingShingle |
Myriam Hernández-Álvarez José Gómez Soriano Patricio Martínez-Barco Annotated Corpus for Citation Context Analysis Latin-American Journal of Computing Corpus annotation methodology machine- learning function polarity aspects schema keywords labels classification |
author_facet |
Myriam Hernández-Álvarez José Gómez Soriano Patricio Martínez-Barco |
author_sort |
Myriam Hernández-Álvarez |
title |
Annotated Corpus for Citation Context Analysis |
title_short |
Annotated Corpus for Citation Context Analysis |
title_full |
Annotated Corpus for Citation Context Analysis |
title_fullStr |
Annotated Corpus for Citation Context Analysis |
title_full_unstemmed |
Annotated Corpus for Citation Context Analysis |
title_sort |
annotated corpus for citation context analysis |
publisher |
Escuela Politécnica Nacional (EPN) |
series |
Latin-American Journal of Computing |
issn |
1390-9266 1390-9134 |
publishDate |
2016-05-01 |
description |
In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions. |
topic |
Corpus annotation methodology machine- learning function polarity aspects schema keywords labels classification |
url |
http://lajc.epn.edu.ec/index.php/LAJC/article/view/102 |
work_keys_str_mv |
AT myriamhernandezalvarez annotatedcorpusforcitationcontextanalysis AT josegomezsoriano annotatedcorpusforcitationcontextanalysis AT patriciomartinezbarco annotatedcorpusforcitationcontextanalysis |
_version_ |
1724632870510657536 |