A merged molecular representation learning for molecular properties prediction with a web-based service
Abstract Deep learning has brought a dramatic development in molecular property prediction that is crucial in the field of drug discovery using various representations such as fingerprints, SMILES, and graphs. In particular, SMILES is used in various deep learning models via character-based approach...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Publishing Group
2021-05-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-021-90259-7 |
id |
doaj-2993cc3aff2243a397e8e03b895d819d |
---|---|
record_format |
Article |
spelling |
doaj-2993cc3aff2243a397e8e03b895d819d2021-05-30T11:40:03ZengNature Publishing GroupScientific Reports2045-23222021-05-011111910.1038/s41598-021-90259-7A merged molecular representation learning for molecular properties prediction with a web-based serviceHyunseob Kim0Jeongcheol Lee1Sunil Ahn2Jongsuk Ruth Lee3Center for Computational Science Platform, Korea Institute of Science and Technology InformationCenter for Computational Science Platform, Korea Institute of Science and Technology InformationCenter for Computational Science Platform, Korea Institute of Science and Technology InformationCenter for Computational Science Platform, Korea Institute of Science and Technology InformationAbstract Deep learning has brought a dramatic development in molecular property prediction that is crucial in the field of drug discovery using various representations such as fingerprints, SMILES, and graphs. In particular, SMILES is used in various deep learning models via character-based approaches. However, SMILES has a limitation in that it is hard to reflect chemical properties. In this paper, we propose a new self-supervised method to learn SMILES and chemical contexts of molecules simultaneously in pre-training the Transformer. The key of our model is learning structures with adjacency matrix embedding and learning logics that can infer descriptors via Quantitative Estimation of Drug-likeness prediction in pre-training. As a result, our method improves the generalization of the data and achieves the best average performance by benchmarking downstream tasks. Moreover, we develop a web-based fine-tuning service to utilize our model on various tasks.https://doi.org/10.1038/s41598-021-90259-7 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hyunseob Kim Jeongcheol Lee Sunil Ahn Jongsuk Ruth Lee |
spellingShingle |
Hyunseob Kim Jeongcheol Lee Sunil Ahn Jongsuk Ruth Lee A merged molecular representation learning for molecular properties prediction with a web-based service Scientific Reports |
author_facet |
Hyunseob Kim Jeongcheol Lee Sunil Ahn Jongsuk Ruth Lee |
author_sort |
Hyunseob Kim |
title |
A merged molecular representation learning for molecular properties prediction with a web-based service |
title_short |
A merged molecular representation learning for molecular properties prediction with a web-based service |
title_full |
A merged molecular representation learning for molecular properties prediction with a web-based service |
title_fullStr |
A merged molecular representation learning for molecular properties prediction with a web-based service |
title_full_unstemmed |
A merged molecular representation learning for molecular properties prediction with a web-based service |
title_sort |
merged molecular representation learning for molecular properties prediction with a web-based service |
publisher |
Nature Publishing Group |
series |
Scientific Reports |
issn |
2045-2322 |
publishDate |
2021-05-01 |
description |
Abstract Deep learning has brought a dramatic development in molecular property prediction that is crucial in the field of drug discovery using various representations such as fingerprints, SMILES, and graphs. In particular, SMILES is used in various deep learning models via character-based approaches. However, SMILES has a limitation in that it is hard to reflect chemical properties. In this paper, we propose a new self-supervised method to learn SMILES and chemical contexts of molecules simultaneously in pre-training the Transformer. The key of our model is learning structures with adjacency matrix embedding and learning logics that can infer descriptors via Quantitative Estimation of Drug-likeness prediction in pre-training. As a result, our method improves the generalization of the data and achieves the best average performance by benchmarking downstream tasks. Moreover, we develop a web-based fine-tuning service to utilize our model on various tasks. |
url |
https://doi.org/10.1038/s41598-021-90259-7 |
work_keys_str_mv |
AT hyunseobkim amergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT jeongcheollee amergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT sunilahn amergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT jongsukruthlee amergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT hyunseobkim mergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT jeongcheollee mergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT sunilahn mergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice AT jongsukruthlee mergedmolecularrepresentationlearningformolecularpropertiespredictionwithawebbasedservice |
_version_ |
1721420015323840512 |