Automated recognition of brain region mentions in neuroscience literature

The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining...

Full description

Bibliographic Details
Main Authors: Leon French, Suzanne Lane, Lydia Xu, Paul Pavlidis
Format: Article
Language:English
Published: Frontiers Media S.A. 2009-09-01
Series:Frontiers in Neuroinformatics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/neuro.11.029.2009/full
id doaj-3bdaa00844a74eed964bd3e224e57ae7
record_format Article
spelling doaj-3bdaa00844a74eed964bd3e224e57ae72020-11-24T23:01:34ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962009-09-01310.3389/neuro.11.029.2009745Automated recognition of brain region mentions in neuroscience literatureLeon French0Leon French1Suzanne Lane2Lydia Xu3Paul Pavlidis4Paul Pavlidis5University of British ColumbiaUniversity of British ColumbiaUniversity of British ColumbiaUniversity of British ColumbiaUniversity of British ColumbiaUniversity of British ColumbiaThe ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/.http://journal.frontiersin.org/Journal/10.3389/neuro.11.029.2009/fullNeuroanatomyNatural Language Processingtext miningconditional random fieldcorpus
collection DOAJ
language English
format Article
sources DOAJ
author Leon French
Leon French
Suzanne Lane
Lydia Xu
Paul Pavlidis
Paul Pavlidis
spellingShingle Leon French
Leon French
Suzanne Lane
Lydia Xu
Paul Pavlidis
Paul Pavlidis
Automated recognition of brain region mentions in neuroscience literature
Frontiers in Neuroinformatics
Neuroanatomy
Natural Language Processing
text mining
conditional random field
corpus
author_facet Leon French
Leon French
Suzanne Lane
Lydia Xu
Paul Pavlidis
Paul Pavlidis
author_sort Leon French
title Automated recognition of brain region mentions in neuroscience literature
title_short Automated recognition of brain region mentions in neuroscience literature
title_full Automated recognition of brain region mentions in neuroscience literature
title_fullStr Automated recognition of brain region mentions in neuroscience literature
title_full_unstemmed Automated recognition of brain region mentions in neuroscience literature
title_sort automated recognition of brain region mentions in neuroscience literature
publisher Frontiers Media S.A.
series Frontiers in Neuroinformatics
issn 1662-5196
publishDate 2009-09-01
description The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/.
topic Neuroanatomy
Natural Language Processing
text mining
conditional random field
corpus
url http://journal.frontiersin.org/Journal/10.3389/neuro.11.029.2009/full
work_keys_str_mv AT leonfrench automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT leonfrench automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT suzannelane automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT lydiaxu automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT paulpavlidis automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT paulpavlidis automatedrecognitionofbrainregionmentionsinneuroscienceliterature
_version_ 1725639120090300416