GEN: highly efficient SMILES explorer using autodidactic generative examination networks

Abstract Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examinat...

Full description

Bibliographic Details
Main Authors:	Ruud van Deursen, Peter Ertl, Igor V. Tetko, Guillaume Godin
Format:	Article
Language:	English
Published:	BMC 2020-04-01
Series:	Journal of Cheminformatics
Subjects:	Autonomous learning GEN GAN RNN LSTM GRU
Online Access:	http://link.springer.com/article/10.1186/s13321-020-00425-8

id	doaj-f54d513a1a4247adb72fac9319098bb9
record_format	Article
spelling	doaj-f54d513a1a4247adb72fac9319098bb92020-11-25T03:18:18ZengBMCJournal of Cheminformatics1758-29462020-04-0112111410.1186/s13321-020-00425-8GEN: highly efficient SMILES explorer using autodidactic generative examination networksRuud van Deursen0Peter Ertl1Igor V. Tetko2Guillaume Godin3Firmenich SA, Research and DevelopmentNovartis Institutes for BioMedical Research, Novartis CampusInstitute of Structural Biology, Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH)Firmenich SA, Research and DevelopmentAbstract Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95–98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85–90%) while generating SMILES with strong conservation of the property space (95–99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.http://link.springer.com/article/10.1186/s13321-020-00425-8Autonomous learningGENGANRNNLSTMGRU
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ruud van Deursen Peter Ertl Igor V. Tetko Guillaume Godin
spellingShingle	Ruud van Deursen Peter Ertl Igor V. Tetko Guillaume Godin GEN: highly efficient SMILES explorer using autodidactic generative examination networks Journal of Cheminformatics Autonomous learning GEN GAN RNN LSTM GRU
author_facet	Ruud van Deursen Peter Ertl Igor V. Tetko Guillaume Godin
author_sort	Ruud van Deursen
title	GEN: highly efficient SMILES explorer using autodidactic generative examination networks
title_short	GEN: highly efficient SMILES explorer using autodidactic generative examination networks
title_full	GEN: highly efficient SMILES explorer using autodidactic generative examination networks
title_fullStr	GEN: highly efficient SMILES explorer using autodidactic generative examination networks
title_full_unstemmed	GEN: highly efficient SMILES explorer using autodidactic generative examination networks
title_sort	gen: highly efficient smiles explorer using autodidactic generative examination networks
publisher	BMC
series	Journal of Cheminformatics
issn	1758-2946
publishDate	2020-04-01
description	Abstract Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95–98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85–90%) while generating SMILES with strong conservation of the property space (95–99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.
topic	Autonomous learning GEN GAN RNN LSTM GRU
url	http://link.springer.com/article/10.1186/s13321-020-00425-8
work_keys_str_mv	AT ruudvandeursen genhighlyefficientsmilesexplorerusingautodidacticgenerativeexaminationnetworks AT peterertl genhighlyefficientsmilesexplorerusingautodidacticgenerativeexaminationnetworks AT igorvtetko genhighlyefficientsmilesexplorerusingautodidacticgenerativeexaminationnetworks AT guillaumegodin genhighlyefficientsmilesexplorerusingautodidacticgenerativeexaminationnetworks
_version_	1724627622295502848

GEN: highly efficient SMILES explorer using autodidactic generative examination networks

Similar Items