Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific...

Full description

Bibliographic Details
Main Authors:	Thorsten Barnickel, Jason Weston, Ronan Collobert, Hans-Werner Mewes, Volker Stümpflen
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2009-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC2712690?pdf=render

id	doaj-e00ebc70e9c548a597ec3f637edeff70
record_format	Article
spelling	doaj-e00ebc70e9c548a597ec3f637edeff702020-11-25T02:27:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-01-0147e639310.1371/journal.pone.0006393Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.Thorsten BarnickelJason WestonRonan CollobertHans-Werner MewesVolker StümpflenTo reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA ("Semantic Extraction using a Neural Network Architecture"), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, co-occurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.http://europepmc.org/articles/PMC2712690?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Thorsten Barnickel Jason Weston Ronan Collobert Hans-Werner Mewes Volker Stümpflen
spellingShingle	Thorsten Barnickel Jason Weston Ronan Collobert Hans-Werner Mewes Volker Stümpflen Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts. PLoS ONE
author_facet	Thorsten Barnickel Jason Weston Ronan Collobert Hans-Werner Mewes Volker Stümpflen
author_sort	Thorsten Barnickel
title	Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_short	Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_full	Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_fullStr	Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_full_unstemmed	Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
title_sort	large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2009-01-01
description	To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA ("Semantic Extraction using a Neural Network Architecture"), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, co-occurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.
url	http://europepmc.org/articles/PMC2712690?pdf=render
work_keys_str_mv	AT thorstenbarnickel largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts AT jasonweston largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts AT ronancollobert largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts AT hanswernermewes largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts AT volkerstumpflen largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
_version_	1724841891518742528

Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.

Similar Items