The rcdk and cluster R packages applied to drug candidate selection

Abstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most...

Full description

Bibliographic Details
Main Authors: Adrian Voicu, Narcis Duteanu, Mirela Voicu, Daliborca Vlad, Victor Dumitrascu
Format: Article
Language:English
Published: BMC 2020-01-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-019-0405-0
id doaj-ab7f1c9ea5cd402f9b7780bb2109fc7f
record_format Article
spelling doaj-ab7f1c9ea5cd402f9b7780bb2109fc7f2021-01-24T12:40:17ZengBMCJournal of Cheminformatics1758-29462020-01-011211810.1186/s13321-019-0405-0The rcdk and cluster R packages applied to drug candidate selectionAdrian Voicu0Narcis Duteanu1Mirela Voicu2Daliborca Vlad3Victor Dumitrascu4Department of Medical Informatics and Biostatistics, Victor Babes University of Medicine and PharmacyDep. CAICAM, Politehnica University of TimisoaraDepartment of Pharmacology-Clinical Pharmacy, Victor Babes University of Medicine and PharmacyDepartment of Pharmacology, Victor Babes University of Medicine and PharmacyDepartment of Pharmacology, Victor Babes University of Medicine and PharmacyAbstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a “score” obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies. Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward ’s method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates.https://doi.org/10.1186/s13321-019-0405-0CytostaticMolecular fingerprintRcdkClusters
collection DOAJ
language English
format Article
sources DOAJ
author Adrian Voicu
Narcis Duteanu
Mirela Voicu
Daliborca Vlad
Victor Dumitrascu
spellingShingle Adrian Voicu
Narcis Duteanu
Mirela Voicu
Daliborca Vlad
Victor Dumitrascu
The rcdk and cluster R packages applied to drug candidate selection
Journal of Cheminformatics
Cytostatic
Molecular fingerprint
Rcdk
Clusters
author_facet Adrian Voicu
Narcis Duteanu
Mirela Voicu
Daliborca Vlad
Victor Dumitrascu
author_sort Adrian Voicu
title The rcdk and cluster R packages applied to drug candidate selection
title_short The rcdk and cluster R packages applied to drug candidate selection
title_full The rcdk and cluster R packages applied to drug candidate selection
title_fullStr The rcdk and cluster R packages applied to drug candidate selection
title_full_unstemmed The rcdk and cluster R packages applied to drug candidate selection
title_sort rcdk and cluster r packages applied to drug candidate selection
publisher BMC
series Journal of Cheminformatics
issn 1758-2946
publishDate 2020-01-01
description Abstract The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a “score” obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies. Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward ’s method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates.
topic Cytostatic
Molecular fingerprint
Rcdk
Clusters
url https://doi.org/10.1186/s13321-019-0405-0
work_keys_str_mv AT adrianvoicu thercdkandclusterrpackagesappliedtodrugcandidateselection
AT narcisduteanu thercdkandclusterrpackagesappliedtodrugcandidateselection
AT mirelavoicu thercdkandclusterrpackagesappliedtodrugcandidateselection
AT daliborcavlad thercdkandclusterrpackagesappliedtodrugcandidateselection
AT victordumitrascu thercdkandclusterrpackagesappliedtodrugcandidateselection
AT adrianvoicu rcdkandclusterrpackagesappliedtodrugcandidateselection
AT narcisduteanu rcdkandclusterrpackagesappliedtodrugcandidateselection
AT mirelavoicu rcdkandclusterrpackagesappliedtodrugcandidateselection
AT daliborcavlad rcdkandclusterrpackagesappliedtodrugcandidateselection
AT victordumitrascu rcdkandclusterrpackagesappliedtodrugcandidateselection
_version_ 1724325599772672000