ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statem...
Main Authors: | , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-07-01
|
Series: | Frontiers in Research Metrics and Analytics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frma.2021.674205/full |
id |
doaj-6563012a49b34ad48518e463b4bcd94c |
---|---|
record_format |
Article |
spelling |
doaj-6563012a49b34ad48518e463b4bcd94c2021-07-13T07:21:40ZengFrontiers Media S.A.Frontiers in Research Metrics and Analytics2504-05372021-07-01610.3389/frma.2021.674205674205ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical ManuscriptsElizabeth T. Hobbs0Stephen M. Goralski1Ashley Mitchell2Andrew Simpson3Dorjan Leka4Emmanuel Kotey5Matt Sekira6James B. Munro7Suvarna Nadendla8Rebecca Jackson9Aitor Gonzalez-Aguirre10Martin Krallinger11Martin Krallinger12Michelle Giglio13Ivan Erill14Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesBarcelona Supercomputing Center (BSC), Barcelona, SpainBarcelona Supercomputing Center (BSC), Barcelona, SpainCentro Nacional de Investigaciones Oncológicas (CNIO), Madrid, SpainInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United StatesDepartment of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United StatesAnalysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.https://www.frontiersin.org/articles/10.3389/frma.2021.674205/fullevidenceannotationcorpustext- and data miningliteraturebiocuration |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Elizabeth T. Hobbs Stephen M. Goralski Ashley Mitchell Andrew Simpson Dorjan Leka Emmanuel Kotey Matt Sekira James B. Munro Suvarna Nadendla Rebecca Jackson Aitor Gonzalez-Aguirre Martin Krallinger Martin Krallinger Michelle Giglio Ivan Erill |
spellingShingle |
Elizabeth T. Hobbs Stephen M. Goralski Ashley Mitchell Andrew Simpson Dorjan Leka Emmanuel Kotey Matt Sekira James B. Munro Suvarna Nadendla Rebecca Jackson Aitor Gonzalez-Aguirre Martin Krallinger Martin Krallinger Michelle Giglio Ivan Erill ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts Frontiers in Research Metrics and Analytics evidence annotation corpus text- and data mining literature biocuration |
author_facet |
Elizabeth T. Hobbs Stephen M. Goralski Ashley Mitchell Andrew Simpson Dorjan Leka Emmanuel Kotey Matt Sekira James B. Munro Suvarna Nadendla Rebecca Jackson Aitor Gonzalez-Aguirre Martin Krallinger Martin Krallinger Michelle Giglio Ivan Erill |
author_sort |
Elizabeth T. Hobbs |
title |
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_short |
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_full |
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_fullStr |
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_full_unstemmed |
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_sort |
eco-collectf: a corpus of annotated evidence-based assertions in biomedical manuscripts |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Research Metrics and Analytics |
issn |
2504-0537 |
publishDate |
2021-07-01 |
description |
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology. |
topic |
evidence annotation corpus text- and data mining literature biocuration |
url |
https://www.frontiersin.org/articles/10.3389/frma.2021.674205/full |
work_keys_str_mv |
AT elizabeththobbs ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT stephenmgoralski ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT ashleymitchell ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT andrewsimpson ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT dorjanleka ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT emmanuelkotey ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT mattsekira ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT jamesbmunro ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT suvarnanadendla ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT rebeccajackson ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT aitorgonzalezaguirre ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT martinkrallinger ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT martinkrallinger ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT michellegiglio ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT ivanerill ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts |
_version_ |
1721306124499550208 |