Compressed graph representation for scalable molecular graph generation

Abstract Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular gra...

Full description

Bibliographic Details
Main Authors: Youngchun Kwon, Dongseon Lee, Youn-Suk Choi, Kyoham Shin, Seokho Kang
Format: Article
Language:English
Published: BMC 2020-09-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-020-00463-2
id doaj-08e4171195b04feabe251a39ec08e2b1
record_format Article
spelling doaj-08e4171195b04feabe251a39ec08e2b12020-11-25T03:21:42ZengBMCJournal of Cheminformatics1758-29462020-09-011211810.1186/s13321-020-00463-2Compressed graph representation for scalable molecular graph generationYoungchun Kwon0Dongseon Lee1Youn-Suk Choi2Kyoham Shin3Seokho Kang4Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd.Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd.Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd.Department of Industrial Engineering, Sungkyunkwan UniversityDepartment of Industrial Engineering, Sungkyunkwan UniversityAbstract Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular graph generation to large molecules with many heavy atoms. In this study, we present a molecular graph compression method to alleviate the complexity while maintaining the capability of generating chemically valid and diverse molecular graphs. We designate six small substructural patterns that are prevalent between two atoms in real-world molecules. These relevant substructures in a molecular graph are then converted to edges by regarding them as additional edge features along with the bond types. This reduces the number of nodes significantly without any information loss. Consequently, a generative model can be constructed in a more efficient and scalable manner with large molecules on a compressed graph representation. We demonstrate the effectiveness of the proposed method for molecules with up to 88 heavy atoms using the GuacaMol benchmark.http://link.springer.com/article/10.1186/s13321-020-00463-2Molecular graph generationCompressed graph representationGraph variational autoencoderDeep learning
collection DOAJ
language English
format Article
sources DOAJ
author Youngchun Kwon
Dongseon Lee
Youn-Suk Choi
Kyoham Shin
Seokho Kang
spellingShingle Youngchun Kwon
Dongseon Lee
Youn-Suk Choi
Kyoham Shin
Seokho Kang
Compressed graph representation for scalable molecular graph generation
Journal of Cheminformatics
Molecular graph generation
Compressed graph representation
Graph variational autoencoder
Deep learning
author_facet Youngchun Kwon
Dongseon Lee
Youn-Suk Choi
Kyoham Shin
Seokho Kang
author_sort Youngchun Kwon
title Compressed graph representation for scalable molecular graph generation
title_short Compressed graph representation for scalable molecular graph generation
title_full Compressed graph representation for scalable molecular graph generation
title_fullStr Compressed graph representation for scalable molecular graph generation
title_full_unstemmed Compressed graph representation for scalable molecular graph generation
title_sort compressed graph representation for scalable molecular graph generation
publisher BMC
series Journal of Cheminformatics
issn 1758-2946
publishDate 2020-09-01
description Abstract Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular graph generation to large molecules with many heavy atoms. In this study, we present a molecular graph compression method to alleviate the complexity while maintaining the capability of generating chemically valid and diverse molecular graphs. We designate six small substructural patterns that are prevalent between two atoms in real-world molecules. These relevant substructures in a molecular graph are then converted to edges by regarding them as additional edge features along with the bond types. This reduces the number of nodes significantly without any information loss. Consequently, a generative model can be constructed in a more efficient and scalable manner with large molecules on a compressed graph representation. We demonstrate the effectiveness of the proposed method for molecules with up to 88 heavy atoms using the GuacaMol benchmark.
topic Molecular graph generation
Compressed graph representation
Graph variational autoencoder
Deep learning
url http://link.springer.com/article/10.1186/s13321-020-00463-2
work_keys_str_mv AT youngchunkwon compressedgraphrepresentationforscalablemoleculargraphgeneration
AT dongseonlee compressedgraphrepresentationforscalablemoleculargraphgeneration
AT younsukchoi compressedgraphrepresentationforscalablemoleculargraphgeneration
AT kyohamshin compressedgraphrepresentationforscalablemoleculargraphgeneration
AT seokhokang compressedgraphrepresentationforscalablemoleculargraphgeneration
_version_ 1724613062848151552