Dependency parsing of biomedical text with BERT

Abstract Background:  Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks center...

Full description

Bibliographic Details
Main Authors: Jenna Kanerva, Filip Ginter, Sampo Pyysalo
Format: Article
Language:English
Published: BMC 2020-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03905-8
id doaj-980e92cad093490ab14fb2d2b804c910
record_format Article
spelling doaj-980e92cad093490ab14fb2d2b804c9102021-01-03T12:21:18ZengBMCBMC Bioinformatics1471-21052020-12-0121S2311210.1186/s12859-020-03905-8Dependency parsing of biomedical text with BERTJenna Kanerva0Filip Ginter1Sampo Pyysalo2TurkuNLP Group, University of TurkuTurkuNLP Group, University of TurkuTurkuNLP Group, University of TurkuAbstract Background:  Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. Methods:  We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. Results:  We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.https://doi.org/10.1186/s12859-020-03905-8ParsingDeep learningCRAFT
collection DOAJ
language English
format Article
sources DOAJ
author Jenna Kanerva
Filip Ginter
Sampo Pyysalo
spellingShingle Jenna Kanerva
Filip Ginter
Sampo Pyysalo
Dependency parsing of biomedical text with BERT
BMC Bioinformatics
Parsing
Deep learning
CRAFT
author_facet Jenna Kanerva
Filip Ginter
Sampo Pyysalo
author_sort Jenna Kanerva
title Dependency parsing of biomedical text with BERT
title_short Dependency parsing of biomedical text with BERT
title_full Dependency parsing of biomedical text with BERT
title_fullStr Dependency parsing of biomedical text with BERT
title_full_unstemmed Dependency parsing of biomedical text with BERT
title_sort dependency parsing of biomedical text with bert
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-12-01
description Abstract Background:  Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. Methods:  We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. Results:  We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.
topic Parsing
Deep learning
CRAFT
url https://doi.org/10.1186/s12859-020-03905-8
work_keys_str_mv AT jennakanerva dependencyparsingofbiomedicaltextwithbert
AT filipginter dependencyparsingofbiomedicaltextwithbert
AT sampopyysalo dependencyparsingofbiomedicaltextwithbert
_version_ 1724350363216117760