CBAG: Conditional biomedical abstract generation.

Biomedical research papers often combine disjoint concepts in novel ways, such as when describing a newly discovered relationship between an understudied gene with an important disease. These concepts are often explicitly encoded as metadata keywords, such as the author-provided terms included with...

Full description

Bibliographic Details
Main Authors: Justin Sybrandt, Ilya Safro
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0253905
id doaj-74f0df3208804bc1b0c9d625465a318b
record_format Article
spelling doaj-74f0df3208804bc1b0c9d625465a318b2021-07-22T04:30:28ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01167e025390510.1371/journal.pone.0253905CBAG: Conditional biomedical abstract generation.Justin SybrandtIlya SafroBiomedical research papers often combine disjoint concepts in novel ways, such as when describing a newly discovered relationship between an understudied gene with an important disease. These concepts are often explicitly encoded as metadata keywords, such as the author-provided terms included with many documents in the MEDLINE database. While substantial recent work has addressed the problem of text generation in a more general context, applications, such as scientific writing assistants, or hypothesis generation systems, could benefit from the capacity to select the specific set of concepts that underpin a generated biomedical text. We propose a conditional language model following the transformer architecture. This model uses the "encoder stack" to encode concepts that a user wishes to discuss in the generated text. The "decoder stack" then follows the masked self-attention pattern to perform text generation, using both prior tokens as well as the encoded condition. We demonstrate that this approach provides significant control, while still producing reasonable biomedical text.https://doi.org/10.1371/journal.pone.0253905
collection DOAJ
language English
format Article
sources DOAJ
author Justin Sybrandt
Ilya Safro
spellingShingle Justin Sybrandt
Ilya Safro
CBAG: Conditional biomedical abstract generation.
PLoS ONE
author_facet Justin Sybrandt
Ilya Safro
author_sort Justin Sybrandt
title CBAG: Conditional biomedical abstract generation.
title_short CBAG: Conditional biomedical abstract generation.
title_full CBAG: Conditional biomedical abstract generation.
title_fullStr CBAG: Conditional biomedical abstract generation.
title_full_unstemmed CBAG: Conditional biomedical abstract generation.
title_sort cbag: conditional biomedical abstract generation.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2021-01-01
description Biomedical research papers often combine disjoint concepts in novel ways, such as when describing a newly discovered relationship between an understudied gene with an important disease. These concepts are often explicitly encoded as metadata keywords, such as the author-provided terms included with many documents in the MEDLINE database. While substantial recent work has addressed the problem of text generation in a more general context, applications, such as scientific writing assistants, or hypothesis generation systems, could benefit from the capacity to select the specific set of concepts that underpin a generated biomedical text. We propose a conditional language model following the transformer architecture. This model uses the "encoder stack" to encode concepts that a user wishes to discuss in the generated text. The "decoder stack" then follows the masked self-attention pattern to perform text generation, using both prior tokens as well as the encoded condition. We demonstrate that this approach provides significant control, while still producing reasonable biomedical text.
url https://doi.org/10.1371/journal.pone.0253905
work_keys_str_mv AT justinsybrandt cbagconditionalbiomedicalabstractgeneration
AT ilyasafro cbagconditionalbiomedicalabstractgeneration
_version_ 1721292193970257920