Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific...

Full description

Bibliographic Details
Main Authors: Thorsten Barnickel, Jason Weston, Ronan Collobert, Hans-Werner Mewes, Volker Stümpflen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2712690?pdf=render
id doaj-e00ebc70e9c548a597ec3f637edeff70
record_format Article
spelling doaj-e00ebc70e9c548a597ec3f637edeff702020-11-25T02:27:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-01-0147e639310.1371/journal.pone.0006393Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.Thorsten BarnickelJason WestonRonan CollobertHans-Werner MewesVolker StümpflenTo reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA ("Semantic Extraction using a Neural Network Architecture"), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, co-occurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.http://europepmc.org/articles/PMC2712690?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Thorsten Barnickel
Jason Weston
Ronan Collobert
Hans-Werner Mewes
Volker Stümpflen
spellingShingle Thorsten Barnickel
Jason Weston
Ronan Collobert
Hans-Werner Mewes
Volker Stümpflen
Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
PLoS ONE
author_facet Thorsten Barnickel
Jason Weston
Ronan Collobert
Hans-Werner Mewes
Volker Stümpflen
author_sort Thorsten Barnickel
title Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_short Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_full Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_fullStr Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_full_unstemmed Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_sort large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-01-01
description To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA ("Semantic Extraction using a Neural Network Architecture"), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, co-occurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.
url http://europepmc.org/articles/PMC2712690?pdf=render
work_keys_str_mv AT thorstenbarnickel largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT jasonweston largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT ronancollobert largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT hanswernermewes largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT volkerstumpflen largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
_version_ 1724841891518742528