ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts

Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statem...

Full description

Bibliographic Details
Main Authors: Elizabeth T. Hobbs, Stephen M. Goralski, Ashley Mitchell, Andrew Simpson, Dorjan Leka, Emmanuel Kotey, Matt Sekira, James B. Munro, Suvarna Nadendla, Rebecca Jackson, Aitor Gonzalez-Aguirre, Martin Krallinger, Michelle Giglio, Ivan Erill
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Research Metrics and Analytics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frma.2021.674205/full
id doaj-6563012a49b34ad48518e463b4bcd94c
record_format Article
spelling doaj-6563012a49b34ad48518e463b4bcd94c2021-07-13T07:21:40ZengFrontiers Media S.A.Frontiers in Research Metrics and Analytics2504-05372021-07-01610.3389/frma.2021.674205674205ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical ManuscriptsElizabeth T. Hobbs0Stephen M. Goralski1Ashley Mitchell2Andrew Simpson3Dorjan Leka4Emmanuel Kotey5Matt Sekira6James B. Munro7Suvarna Nadendla8Rebecca Jackson9Aitor Gonzalez-Aguirre10Martin Krallinger11Martin Krallinger12Michelle Giglio13Ivan Erill14Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesBarcelona Supercomputing Center (BSC), Barcelona, SpainBarcelona Supercomputing Center (BSC), Barcelona, SpainCentro Nacional de Investigaciones Oncológicas (CNIO), Madrid, SpainInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesAnalysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.https://www.frontiersin.org/articles/10.3389/frma.2021.674205/fullevidenceannotationcorpustext- and data miningliteraturebiocuration
collection DOAJ
language English
format Article
sources DOAJ
author Elizabeth T. Hobbs
Stephen M. Goralski
Ashley Mitchell
Andrew Simpson
Dorjan Leka
Emmanuel Kotey
Matt Sekira
James B. Munro
Suvarna Nadendla
Rebecca Jackson
Aitor Gonzalez-Aguirre
Martin Krallinger
Martin Krallinger
Michelle Giglio
Ivan Erill
spellingShingle Elizabeth T. Hobbs
Stephen M. Goralski
Ashley Mitchell
Andrew Simpson
Dorjan Leka
Emmanuel Kotey
Matt Sekira
James B. Munro
Suvarna Nadendla
Rebecca Jackson
Aitor Gonzalez-Aguirre
Martin Krallinger
Martin Krallinger
Michelle Giglio
Ivan Erill
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
Frontiers in Research Metrics and Analytics
evidence
annotation
corpus
text- and data mining
literature
biocuration
author_facet Elizabeth T. Hobbs
Stephen M. Goralski
Ashley Mitchell
Andrew Simpson
Dorjan Leka
Emmanuel Kotey
Matt Sekira
James B. Munro
Suvarna Nadendla
Rebecca Jackson
Aitor Gonzalez-Aguirre
Martin Krallinger
Martin Krallinger
Michelle Giglio
Ivan Erill
author_sort Elizabeth T. Hobbs
title ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
title_short ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
title_full ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
title_fullStr ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
title_full_unstemmed ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
title_sort eco-collectf: a corpus of annotated evidence-based assertions in biomedical manuscripts
publisher Frontiers Media S.A.
series Frontiers in Research Metrics and Analytics
issn 2504-0537
publishDate 2021-07-01
description Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.
topic evidence
annotation
corpus
text- and data mining
literature
biocuration
url https://www.frontiersin.org/articles/10.3389/frma.2021.674205/full
work_keys_str_mv AT elizabeththobbs ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT stephenmgoralski ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT ashleymitchell ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT andrewsimpson ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT dorjanleka ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT emmanuelkotey ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT mattsekira ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT jamesbmunro ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT suvarnanadendla ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT rebeccajackson ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT aitorgonzalezaguirre ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT martinkrallinger ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT martinkrallinger ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT michellegiglio ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
AT ivanerill ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts
_version_ 1721306124499550208