Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the co...

Full description

Bibliographic Details
Main Authors: Zixuan Cang, Lin Mu, Guo-Wei Wei
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC5774846?pdf=render
id doaj-9c7a06ee9889405c83bb533a46aab93c
record_format Article
spelling doaj-9c7a06ee9889405c83bb533a46aab93c2020-11-25T01:57:42ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-01-01141e100592910.1371/journal.pcbi.1005929Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.Zixuan CangLin MuGuo-Wei WeiThis work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.http://europepmc.org/articles/PMC5774846?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Zixuan Cang
Lin Mu
Guo-Wei Wei
spellingShingle Zixuan Cang
Lin Mu
Guo-Wei Wei
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
PLoS Computational Biology
author_facet Zixuan Cang
Lin Mu
Guo-Wei Wei
author_sort Zixuan Cang
title Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
title_short Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
title_full Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
title_fullStr Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
title_full_unstemmed Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
title_sort representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-01-01
description This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
url http://europepmc.org/articles/PMC5774846?pdf=render
work_keys_str_mv AT zixuancang representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening
AT linmu representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening
AT guoweiwei representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening
_version_ 1724973022939447296