The rcdk and cluster R packages applied to drug candidate selection
Abstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-01-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-019-0405-0 |
id |
doaj-ab7f1c9ea5cd402f9b7780bb2109fc7f |
---|---|
record_format |
Article |
spelling |
doaj-ab7f1c9ea5cd402f9b7780bb2109fc7f2021-01-24T12:40:17ZengBMCJournal of Cheminformatics1758-29462020-01-011211810.1186/s13321-019-0405-0The rcdk and cluster R packages applied to drug candidate selectionAdrian Voicu0Narcis Duteanu1Mirela Voicu2Daliborca Vlad3Victor Dumitrascu4Department of Medical Informatics and Biostatistics, Victor Babes University of Medicine and PharmacyDep. CAICAM, Politehnica University of TimisoaraDepartment of Pharmacology-Clinical Pharmacy, Victor Babes University of Medicine and PharmacyDepartment of Pharmacology, Victor Babes University of Medicine and PharmacyDepartment of Pharmacology, Victor Babes University of Medicine and PharmacyAbstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a “score” obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies. Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward ’s method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates.https://doi.org/10.1186/s13321-019-0405-0CytostaticMolecular fingerprintRcdkClusters |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Adrian Voicu Narcis Duteanu Mirela Voicu Daliborca Vlad Victor Dumitrascu |
spellingShingle |
Adrian Voicu Narcis Duteanu Mirela Voicu Daliborca Vlad Victor Dumitrascu The rcdk and cluster R packages applied to drug candidate selection Journal of Cheminformatics Cytostatic Molecular fingerprint Rcdk Clusters |
author_facet |
Adrian Voicu Narcis Duteanu Mirela Voicu Daliborca Vlad Victor Dumitrascu |
author_sort |
Adrian Voicu |
title |
The rcdk and cluster R packages applied to drug candidate selection |
title_short |
The rcdk and cluster R packages applied to drug candidate selection |
title_full |
The rcdk and cluster R packages applied to drug candidate selection |
title_fullStr |
The rcdk and cluster R packages applied to drug candidate selection |
title_full_unstemmed |
The rcdk and cluster R packages applied to drug candidate selection |
title_sort |
rcdk and cluster r packages applied to drug candidate selection |
publisher |
BMC |
series |
Journal of Cheminformatics |
issn |
1758-2946 |
publishDate |
2020-01-01 |
description |
Abstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a “score” obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies. Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward ’s method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates. |
topic |
Cytostatic Molecular fingerprint Rcdk Clusters |
url |
https://doi.org/10.1186/s13321-019-0405-0 |
work_keys_str_mv |
AT adrianvoicu thercdkandclusterrpackagesappliedtodrugcandidateselection AT narcisduteanu thercdkandclusterrpackagesappliedtodrugcandidateselection AT mirelavoicu thercdkandclusterrpackagesappliedtodrugcandidateselection AT daliborcavlad thercdkandclusterrpackagesappliedtodrugcandidateselection AT victordumitrascu thercdkandclusterrpackagesappliedtodrugcandidateselection AT adrianvoicu rcdkandclusterrpackagesappliedtodrugcandidateselection AT narcisduteanu rcdkandclusterrpackagesappliedtodrugcandidateselection AT mirelavoicu rcdkandclusterrpackagesappliedtodrugcandidateselection AT daliborcavlad rcdkandclusterrpackagesappliedtodrugcandidateselection AT victordumitrascu rcdkandclusterrpackagesappliedtodrugcandidateselection |
_version_ |
1724325599772672000 |