The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Abstract Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminfor...

Full description

Bibliographic Details
Main Authors: Egon L. Willighagen, John W. Mayfield, Jonathan Alvarsson, Arvid Berg, Lars Carlsson, Nina Jeliazkova, Stefan Kuhn, Tomáš Pluskal, Miquel Rojas-Chertó, Ola Spjuth, Gilleain Torrance, Chris T. Evelo, Rajarshi Guha, Christoph Steinbeck
Format: Article
Language:English
Published: BMC 2017-06-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-017-0220-4
id doaj-04d9c1e269724925b7979d21a9dd3f92
record_format Article
spelling doaj-04d9c1e269724925b7979d21a9dd3f922020-11-25T00:42:45ZengBMCJournal of Cheminformatics1758-29462017-06-019111910.1186/s13321-017-0220-4The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searchingEgon L. Willighagen0John W. Mayfield1Jonathan Alvarsson2Arvid Berg3Lars Carlsson4Nina Jeliazkova5Stefan Kuhn6Tomáš Pluskal7Miquel Rojas-Chertó8Ola Spjuth9Gilleain TorranceChris T. Evelo10Rajarshi Guha11Christoph Steinbeck12Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht UniversityNextMove Software LtdDepartment of Pharmaceutical Biosciences, Uppsala UniversityDepartment of Pharmaceutical Biosciences, Uppsala UniversityAstraZeneca, Innovative Medicines & Early Development, Quantitative BiologyIdeaconsult LtdDepartment of Informatics, University of LeicesterWhitehead Institute for Biomedical ResearchQuímica Clínica AplicadaDepartment of Pharmaceutical Biosciences, Uppsala UniversityDepartment of Bioinformatics - BiGCaT, NUTRIM, Maastricht UniversityNational Center for Advancing Translational SciencesInstitute for Inorganic and Analytical Chemistry, Friedrich-Schiller-UniversityAbstract Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software. Graphical abstract CDK 2.0 provides new features and improved performancehttp://link.springer.com/article/10.1186/s13321-017-0220-4JavaCheminformaticsBioinformaticsMetabolomicsDepiction
collection DOAJ
language English
format Article
sources DOAJ
author Egon L. Willighagen
John W. Mayfield
Jonathan Alvarsson
Arvid Berg
Lars Carlsson
Nina Jeliazkova
Stefan Kuhn
Tomáš Pluskal
Miquel Rojas-Chertó
Ola Spjuth
Gilleain Torrance
Chris T. Evelo
Rajarshi Guha
Christoph Steinbeck
spellingShingle Egon L. Willighagen
John W. Mayfield
Jonathan Alvarsson
Arvid Berg
Lars Carlsson
Nina Jeliazkova
Stefan Kuhn
Tomáš Pluskal
Miquel Rojas-Chertó
Ola Spjuth
Gilleain Torrance
Chris T. Evelo
Rajarshi Guha
Christoph Steinbeck
The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
Journal of Cheminformatics
Java
Cheminformatics
Bioinformatics
Metabolomics
Depiction
author_facet Egon L. Willighagen
John W. Mayfield
Jonathan Alvarsson
Arvid Berg
Lars Carlsson
Nina Jeliazkova
Stefan Kuhn
Tomáš Pluskal
Miquel Rojas-Chertó
Ola Spjuth
Gilleain Torrance
Chris T. Evelo
Rajarshi Guha
Christoph Steinbeck
author_sort Egon L. Willighagen
title The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_short The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_full The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_fullStr The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_full_unstemmed The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_sort chemistry development kit (cdk) v2.0: atom typing, depiction, molecular formulas, and substructure searching
publisher BMC
series Journal of Cheminformatics
issn 1758-2946
publishDate 2017-06-01
description Abstract Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software. Graphical abstract CDK 2.0 provides new features and improved performance
topic Java
Cheminformatics
Bioinformatics
Metabolomics
Depiction
url http://link.springer.com/article/10.1186/s13321-017-0220-4
work_keys_str_mv AT egonlwillighagen thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT johnwmayfield thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT jonathanalvarsson thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT arvidberg thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT larscarlsson thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT ninajeliazkova thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT stefankuhn thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT tomaspluskal thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT miquelrojascherto thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT olaspjuth thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT gilleaintorrance thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT christevelo thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT rajarshiguha thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT christophsteinbeck thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT egonlwillighagen chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT johnwmayfield chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT jonathanalvarsson chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT arvidberg chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT larscarlsson chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT ninajeliazkova chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT stefankuhn chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT tomaspluskal chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT miquelrojascherto chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT olaspjuth chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT gilleaintorrance chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT christevelo chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT rajarshiguha chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT christophsteinbeck chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
_version_ 1725280469138800640