Semantic text mining in early drug discovery for type 2 diabetes.

<h4>Background</h4>Surveying the scientific literature is an important part of early drug discovery; and with the ever-increasing amount of biomedical publications it is imperative to focus on the most interesting articles. Here we present a project that highlights new understanding (e.g...

Full description

Bibliographic Details
Main Authors: Lena K Hansson, Rasmus Borup Hansen, Sune Pletscher-Frankild, Rudolfs Berzins, Daniel Hvidberg Hansen, Dennis Madsen, Sten B Christensen, Malene Revsbech Christiansen, Ulrika Boulund, Xenia Asbæk Wolf, Sonny Kim Kjærulff, Martijn van de Bunt, Søren Tulin, Thomas Skøt Jensen, Rasmus Wernersson, Jan Nygaard Jensen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0233956
id doaj-97287a9d68b2414aad61ea290a973c33
record_format Article
spelling doaj-97287a9d68b2414aad61ea290a973c332021-03-04T11:17:50ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01156e023395610.1371/journal.pone.0233956Semantic text mining in early drug discovery for type 2 diabetes.Lena K HanssonRasmus Borup HansenSune Pletscher-FrankildRudolfs BerzinsDaniel Hvidberg HansenDennis MadsenSten B ChristensenMalene Revsbech ChristiansenUlrika BoulundXenia Asbæk WolfSonny Kim KjærulffMartijn van de BuntSøren TulinThomas Skøt JensenRasmus WernerssonJan Nygaard Jensen<h4>Background</h4>Surveying the scientific literature is an important part of early drug discovery; and with the ever-increasing amount of biomedical publications it is imperative to focus on the most interesting articles. Here we present a project that highlights new understanding (e.g. recently discovered modes of action) and identifies potential drug targets, via a novel, data-driven text mining approach to score type 2 diabetes (T2D) relevance. We focused on monitoring trends and jumps in T2D relevance to help us be timely informed of important breakthroughs.<h4>Methods</h4>We extracted over 7 million n-grams from PubMed abstracts and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant 'semantic concepts'. To score papers, we weighted the concepts based on co-mentioning with core T2D proteins. A protein's T2D relevance was determined by combining the scores of the papers mentioning it in the five preceding years. Each week all proteins were ranked according to their T2D relevance. Furthermore, the historical distribution of changes in rank from one week to the next was used to calculate the significance of a change in rank by T2D relevance for each protein.<h4>Results</h4>We show that T2D relevant papers, even those not mentioning T2D explicitly, were prioritised by relevant semantic concepts. Well known T2D proteins were therefore enriched among the top scoring proteins. Our 'high jumpers' identified important past developments in the apprehension of how certain key proteins relate to T2D, indicating that our method will make us aware of future breakthroughs. In summary, this project facilitated keeping up with current T2D research by repeatedly providing short lists of potential novel targets into our early drug discovery pipeline.https://doi.org/10.1371/journal.pone.0233956
collection DOAJ
language English
format Article
sources DOAJ
author Lena K Hansson
Rasmus Borup Hansen
Sune Pletscher-Frankild
Rudolfs Berzins
Daniel Hvidberg Hansen
Dennis Madsen
Sten B Christensen
Malene Revsbech Christiansen
Ulrika Boulund
Xenia Asbæk Wolf
Sonny Kim Kjærulff
Martijn van de Bunt
Søren Tulin
Thomas Skøt Jensen
Rasmus Wernersson
Jan Nygaard Jensen
spellingShingle Lena K Hansson
Rasmus Borup Hansen
Sune Pletscher-Frankild
Rudolfs Berzins
Daniel Hvidberg Hansen
Dennis Madsen
Sten B Christensen
Malene Revsbech Christiansen
Ulrika Boulund
Xenia Asbæk Wolf
Sonny Kim Kjærulff
Martijn van de Bunt
Søren Tulin
Thomas Skøt Jensen
Rasmus Wernersson
Jan Nygaard Jensen
Semantic text mining in early drug discovery for type 2 diabetes.
PLoS ONE
author_facet Lena K Hansson
Rasmus Borup Hansen
Sune Pletscher-Frankild
Rudolfs Berzins
Daniel Hvidberg Hansen
Dennis Madsen
Sten B Christensen
Malene Revsbech Christiansen
Ulrika Boulund
Xenia Asbæk Wolf
Sonny Kim Kjærulff
Martijn van de Bunt
Søren Tulin
Thomas Skøt Jensen
Rasmus Wernersson
Jan Nygaard Jensen
author_sort Lena K Hansson
title Semantic text mining in early drug discovery for type 2 diabetes.
title_short Semantic text mining in early drug discovery for type 2 diabetes.
title_full Semantic text mining in early drug discovery for type 2 diabetes.
title_fullStr Semantic text mining in early drug discovery for type 2 diabetes.
title_full_unstemmed Semantic text mining in early drug discovery for type 2 diabetes.
title_sort semantic text mining in early drug discovery for type 2 diabetes.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description <h4>Background</h4>Surveying the scientific literature is an important part of early drug discovery; and with the ever-increasing amount of biomedical publications it is imperative to focus on the most interesting articles. Here we present a project that highlights new understanding (e.g. recently discovered modes of action) and identifies potential drug targets, via a novel, data-driven text mining approach to score type 2 diabetes (T2D) relevance. We focused on monitoring trends and jumps in T2D relevance to help us be timely informed of important breakthroughs.<h4>Methods</h4>We extracted over 7 million n-grams from PubMed abstracts and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant 'semantic concepts'. To score papers, we weighted the concepts based on co-mentioning with core T2D proteins. A protein's T2D relevance was determined by combining the scores of the papers mentioning it in the five preceding years. Each week all proteins were ranked according to their T2D relevance. Furthermore, the historical distribution of changes in rank from one week to the next was used to calculate the significance of a change in rank by T2D relevance for each protein.<h4>Results</h4>We show that T2D relevant papers, even those not mentioning T2D explicitly, were prioritised by relevant semantic concepts. Well known T2D proteins were therefore enriched among the top scoring proteins. Our 'high jumpers' identified important past developments in the apprehension of how certain key proteins relate to T2D, indicating that our method will make us aware of future breakthroughs. In summary, this project facilitated keeping up with current T2D research by repeatedly providing short lists of potential novel targets into our early drug discovery pipeline.
url https://doi.org/10.1371/journal.pone.0233956
work_keys_str_mv AT lenakhansson semantictextmininginearlydrugdiscoveryfortype2diabetes
AT rasmusboruphansen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT sunepletscherfrankild semantictextmininginearlydrugdiscoveryfortype2diabetes
AT rudolfsberzins semantictextmininginearlydrugdiscoveryfortype2diabetes
AT danielhvidberghansen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT dennismadsen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT stenbchristensen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT malenerevsbechchristiansen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT ulrikaboulund semantictextmininginearlydrugdiscoveryfortype2diabetes
AT xeniaasbækwolf semantictextmininginearlydrugdiscoveryfortype2diabetes
AT sonnykimkjærulff semantictextmininginearlydrugdiscoveryfortype2diabetes
AT martijnvandebunt semantictextmininginearlydrugdiscoveryfortype2diabetes
AT sørentulin semantictextmininginearlydrugdiscoveryfortype2diabetes
AT thomasskøtjensen semantictextmininginearlydrugdiscoveryfortype2diabetes
AT rasmuswernersson semantictextmininginearlydrugdiscoveryfortype2diabetes
AT jannygaardjensen semantictextmininginearlydrugdiscoveryfortype2diabetes
_version_ 1714804062738710528