Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds

<p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that defi...

Full description

Bibliographic Details
Main Authors: Southan Christopher, Boppana Kiran, Jagarlapudi Sarma ARP, Muresan Sorel
Format: Article
Language:English
Published: BMC 2011-05-01
Series:Journal of Cheminformatics
Online Access:http://www.jcheminf.com/content/3/1/14
id doaj-e96c4fe85be64e5798337cc7028b44de
record_format Article
spelling doaj-e96c4fe85be64e5798337cc7028b44de2020-11-25T01:30:36ZengBMCJournal of Cheminformatics1758-29462011-05-01311410.1186/1758-2946-3-14Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffoldsSouthan ChristopherBoppana KiranJagarlapudi Sarma ARPMuresan Sorel<p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.</p> <p>Results</p> <p>From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical <it>in vitro </it>binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.</p> <p>Conclusions</p> <p>The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.</p> http://www.jcheminf.com/content/3/1/14
collection DOAJ
language English
format Article
sources DOAJ
author Southan Christopher
Boppana Kiran
Jagarlapudi Sarma ARP
Muresan Sorel
spellingShingle Southan Christopher
Boppana Kiran
Jagarlapudi Sarma ARP
Muresan Sorel
Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
Journal of Cheminformatics
author_facet Southan Christopher
Boppana Kiran
Jagarlapudi Sarma ARP
Muresan Sorel
author_sort Southan Christopher
title Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
title_short Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
title_full Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
title_fullStr Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
title_full_unstemmed Analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
title_sort analysis of <it>in vitro </it>bioactivity data extracted from drug discovery literature and patents: ranking 1654 human protein targets by assayed compounds and molecular scaffolds
publisher BMC
series Journal of Cheminformatics
issn 1758-2946
publishDate 2011-05-01
description <p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.</p> <p>Results</p> <p>From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical <it>in vitro </it>binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.</p> <p>Conclusions</p> <p>The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.</p>
url http://www.jcheminf.com/content/3/1/14
work_keys_str_mv AT southanchristopher analysisofitinvitroitbioactivitydataextractedfromdrugdiscoveryliteratureandpatentsranking1654humanproteintargetsbyassayedcompoundsandmolecularscaffolds
AT boppanakiran analysisofitinvitroitbioactivitydataextractedfromdrugdiscoveryliteratureandpatentsranking1654humanproteintargetsbyassayedcompoundsandmolecularscaffolds
AT jagarlapudisarmaarp analysisofitinvitroitbioactivitydataextractedfromdrugdiscoveryliteratureandpatentsranking1654humanproteintargetsbyassayedcompoundsandmolecularscaffolds
AT muresansorel analysisofitinvitroitbioactivitydataextractedfromdrugdiscoveryliteratureandpatentsranking1654humanproteintargetsbyassayedcompoundsandmolecularscaffolds
_version_ 1725091226832601088