Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models i...

Full description

Bibliographic Details
Main Authors: Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alán Aspuru-Guzik, Alex Zhavoronkov
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-12-01
Series:Frontiers in Pharmacology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphar.2020.565644/full
id doaj-20c7f554a2f7442e8fa7b4cbcaa41469
record_format Article
spelling doaj-20c7f554a2f7442e8fa7b4cbcaa414692020-12-18T06:30:50ZengFrontiers Media S.A.Frontiers in Pharmacology1663-98122020-12-011110.3389/fphar.2020.565644565644Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation ModelsDaniil Polykovskiy0Alexander Zhebrak1Benjamin Sanchez-Lengeling2Sergey Golovanov3Oktai Tatanov4Stanislav Belyaev5Rauf Kurbanov6Aleksey Artamonov7Vladimir Aladinskiy8Mark Veselov9Artur Kadurin10Simon Johansson11Hongming Chen12Sergey Nikolenko13Sergey Nikolenko14Sergey Nikolenko15Alán Aspuru-Guzik16Alán Aspuru-Guzik17Alán Aspuru-Guzik18Alán Aspuru-Guzik19Alex Zhavoronkov20Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongChemistry and Chemical Biology Department, Harvard University, Cambridge, MA, United StatesNeuromation OU, Tallinn, EstoniaNeuromation OU, Tallinn, EstoniaNeuromation OU, Tallinn, EstoniaNeuromation OU, Tallinn, EstoniaNeuromation OU, Tallinn, EstoniaInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongMolecular AI, DiscoverySciences, R&D, AstraZeneca, Gothenburg, SwedenMolecular AI, DiscoverySciences, R&D, AstraZeneca, Gothenburg, SwedenInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongNeuromation OU, Tallinn, EstoniaComputer Science Department, National Research University Higher School of Economics, St. Petersburg, RussiaChemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, CanadaDepartment of Computer Science, University of Toronto, Toronto, ON, CanadaCIFAR AI Chair, Vector Institute for Artificial Intelligence, Toronto, ON, CanadaLebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, ON, CanadaInsilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong KongGenerative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.https://www.frontiersin.org/articles/10.3389/fphar.2020.565644/fullgenerative modelsdrug discoverydeep learningbenchmarkdistribution learning
collection DOAJ
language English
format Article
sources DOAJ
author Daniil Polykovskiy
Alexander Zhebrak
Benjamin Sanchez-Lengeling
Sergey Golovanov
Oktai Tatanov
Stanislav Belyaev
Rauf Kurbanov
Aleksey Artamonov
Vladimir Aladinskiy
Mark Veselov
Artur Kadurin
Simon Johansson
Hongming Chen
Sergey Nikolenko
Sergey Nikolenko
Sergey Nikolenko
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alex Zhavoronkov
spellingShingle Daniil Polykovskiy
Alexander Zhebrak
Benjamin Sanchez-Lengeling
Sergey Golovanov
Oktai Tatanov
Stanislav Belyaev
Rauf Kurbanov
Aleksey Artamonov
Vladimir Aladinskiy
Mark Veselov
Artur Kadurin
Simon Johansson
Hongming Chen
Sergey Nikolenko
Sergey Nikolenko
Sergey Nikolenko
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alex Zhavoronkov
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Frontiers in Pharmacology
generative models
drug discovery
deep learning
benchmark
distribution learning
author_facet Daniil Polykovskiy
Alexander Zhebrak
Benjamin Sanchez-Lengeling
Sergey Golovanov
Oktai Tatanov
Stanislav Belyaev
Rauf Kurbanov
Aleksey Artamonov
Vladimir Aladinskiy
Mark Veselov
Artur Kadurin
Simon Johansson
Hongming Chen
Sergey Nikolenko
Sergey Nikolenko
Sergey Nikolenko
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alán Aspuru-Guzik
Alex Zhavoronkov
author_sort Daniil Polykovskiy
title Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
title_short Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
title_full Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
title_fullStr Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
title_full_unstemmed Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
title_sort molecular sets (moses): a benchmarking platform for molecular generation models
publisher Frontiers Media S.A.
series Frontiers in Pharmacology
issn 1663-9812
publishDate 2020-12-01
description Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.
topic generative models
drug discovery
deep learning
benchmark
distribution learning
url https://www.frontiersin.org/articles/10.3389/fphar.2020.565644/full
work_keys_str_mv AT daniilpolykovskiy molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alexanderzhebrak molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT benjaminsanchezlengeling molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT sergeygolovanov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT oktaitatanov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT stanislavbelyaev molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT raufkurbanov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alekseyartamonov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT vladimiraladinskiy molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT markveselov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT arturkadurin molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT simonjohansson molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT hongmingchen molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT sergeynikolenko molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT sergeynikolenko molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT sergeynikolenko molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alanaspuruguzik molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alanaspuruguzik molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alanaspuruguzik molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alanaspuruguzik molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
AT alexzhavoronkov molecularsetsmosesabenchmarkingplatformformoleculargenerationmodels
_version_ 1724378626683568128