Semantic similarity for automatic classification of chemical compounds.

With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which st...

Full description

Bibliographic Details
Main Authors: João D Ferreira, Francisco M Couto
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-09-01
Series:PLoS Computational Biology
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20885779/pdf/?tool=EBI
id doaj-96bb4f1feea84c7abc362d62a8be458e
record_format Article
spelling doaj-96bb4f1feea84c7abc362d62a8be458e2021-04-21T15:30:52ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582010-09-016910.1371/journal.pcbi.1000937Semantic similarity for automatic classification of chemical compounds.João D FerreiraFrancisco M CoutoWith the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20885779/pdf/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author João D Ferreira
Francisco M Couto
spellingShingle João D Ferreira
Francisco M Couto
Semantic similarity for automatic classification of chemical compounds.
PLoS Computational Biology
author_facet João D Ferreira
Francisco M Couto
author_sort João D Ferreira
title Semantic similarity for automatic classification of chemical compounds.
title_short Semantic similarity for automatic classification of chemical compounds.
title_full Semantic similarity for automatic classification of chemical compounds.
title_fullStr Semantic similarity for automatic classification of chemical compounds.
title_full_unstemmed Semantic similarity for automatic classification of chemical compounds.
title_sort semantic similarity for automatic classification of chemical compounds.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2010-09-01
description With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20885779/pdf/?tool=EBI
work_keys_str_mv AT joaodferreira semanticsimilarityforautomaticclassificationofchemicalcompounds
AT franciscomcouto semanticsimilarityforautomaticclassificationofchemicalcompounds
_version_ 1714667296967884800