A general approach for retrosynthetic molecular core analysis

Abstract Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represe...

Full description

Bibliographic Details
Main Authors: J. Jesús Naveja, B. Angélica Pilón-Jiménez, Jürgen Bajorath, José L. Medina-Franco
Format: Article
Language:English
Published: BMC 2019-09-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-019-0380-5
id doaj-218a3b8f0e2e4b6c9273e84e90f43cbf
record_format Article
spelling doaj-218a3b8f0e2e4b6c9273e84e90f43cbf2020-11-25T03:47:25ZengBMCJournal of Cheminformatics1758-29462019-09-011111910.1186/s13321-019-0380-5A general approach for retrosynthetic molecular core analysisJ. Jesús Naveja0B. Angélica Pilón-Jiménez1Jürgen Bajorath2José L. Medina-Franco3PECEM, School of Medicine, Universidad Nacional Autónoma de MéxicoDepartment of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de MéxicoDepartment of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-UniversitätDepartment of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de MéxicoAbstract Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied “single molecule–single scaffold” correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure–property relationships, and identification of analog series and ASBS. The molecule–core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure–property relationships analyses.http://link.springer.com/article/10.1186/s13321-019-0380-5Analog series-based scaffoldAnalog searchingCore structure–property relationships (CSPR)RECAPScaffoldVirtual screening
collection DOAJ
language English
format Article
sources DOAJ
author J. Jesús Naveja
B. Angélica Pilón-Jiménez
Jürgen Bajorath
José L. Medina-Franco
spellingShingle J. Jesús Naveja
B. Angélica Pilón-Jiménez
Jürgen Bajorath
José L. Medina-Franco
A general approach for retrosynthetic molecular core analysis
Journal of Cheminformatics
Analog series-based scaffold
Analog searching
Core structure–property relationships (CSPR)
RECAP
Scaffold
Virtual screening
author_facet J. Jesús Naveja
B. Angélica Pilón-Jiménez
Jürgen Bajorath
José L. Medina-Franco
author_sort J. Jesús Naveja
title A general approach for retrosynthetic molecular core analysis
title_short A general approach for retrosynthetic molecular core analysis
title_full A general approach for retrosynthetic molecular core analysis
title_fullStr A general approach for retrosynthetic molecular core analysis
title_full_unstemmed A general approach for retrosynthetic molecular core analysis
title_sort general approach for retrosynthetic molecular core analysis
publisher BMC
series Journal of Cheminformatics
issn 1758-2946
publishDate 2019-09-01
description Abstract Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied “single molecule–single scaffold” correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure–property relationships, and identification of analog series and ASBS. The molecule–core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure–property relationships analyses.
topic Analog series-based scaffold
Analog searching
Core structure–property relationships (CSPR)
RECAP
Scaffold
Virtual screening
url http://link.springer.com/article/10.1186/s13321-019-0380-5
work_keys_str_mv AT jjesusnaveja ageneralapproachforretrosyntheticmolecularcoreanalysis
AT bangelicapilonjimenez ageneralapproachforretrosyntheticmolecularcoreanalysis
AT jurgenbajorath ageneralapproachforretrosyntheticmolecularcoreanalysis
AT joselmedinafranco ageneralapproachforretrosyntheticmolecularcoreanalysis
AT jjesusnaveja generalapproachforretrosyntheticmolecularcoreanalysis
AT bangelicapilonjimenez generalapproachforretrosyntheticmolecularcoreanalysis
AT jurgenbajorath generalapproachforretrosyntheticmolecularcoreanalysis
AT joselmedinafranco generalapproachforretrosyntheticmolecularcoreanalysis
_version_ 1724501898948509696