TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery

Abstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine c...

Full description

Bibliographic Details
Main Authors: Guillermo Serrano Nájera, David Narganes Carlón, Daniel J. Crowther
Format: Article
Language:English
Published: Nature Publishing Group 2021-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-94897-9
id doaj-5772d4e7752d418392e61c47ac3c588d
record_format Article
spelling doaj-5772d4e7752d418392e61c47ac3c588d2021-08-08T11:23:15ZengNature Publishing GroupScientific Reports2045-23222021-08-0111111810.1038/s41598-021-94897-9TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discoveryGuillermo Serrano Nájera0David Narganes Carlón1Daniel J. Crowther2Division of Cell and Developmental Biology, School of Life Sciences, University of DundeeDivision of Cell and Developmental Biology, School of Life Sciences, University of DundeeExscientia Ltd, Dundee OneAbstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.https://doi.org/10.1038/s41598-021-94897-9
collection DOAJ
language English
format Article
sources DOAJ
author Guillermo Serrano Nájera
David Narganes Carlón
Daniel J. Crowther
spellingShingle Guillermo Serrano Nájera
David Narganes Carlón
Daniel J. Crowther
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
Scientific Reports
author_facet Guillermo Serrano Nájera
David Narganes Carlón
Daniel J. Crowther
author_sort Guillermo Serrano Nájera
title TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_short TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_full TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_fullStr TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_full_unstemmed TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_sort trendygenes, a computational pipeline for the detection of literature trends in academia and drug discovery
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-08-01
description Abstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.
url https://doi.org/10.1038/s41598-021-94897-9
work_keys_str_mv AT guillermoserranonajera trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery
AT davidnarganescarlon trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery
AT danieljcrowther trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery
_version_ 1721216050452758528