IntelliGO: a new vector-based semantic similarity measure including annotation origin

<p>Abstract</p> <p>Background</p> <p>The Gene Ontology (GO) is a well known controlled vocabulary describing the <it>biological process</it>, <it>molecular function </it>and <it>cellular component </it>aspects of gene annotation. It h...

Full description

Bibliographic Details
Main Authors: Devignes Marie-Dominique, Napoli Amedeo, Poch Olivier, Smail-Tabbone Malika, Benabderrahmane Sidahmed
Format: Article
Language:English
Published: BMC 2010-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/588
id doaj-68cd02dcd12441d091eae8b8adf7a1b9
record_format Article
spelling doaj-68cd02dcd12441d091eae8b8adf7a1b92020-11-24T21:06:54ZengBMCBMC Bioinformatics1471-21052010-12-0111158810.1186/1471-2105-11-588IntelliGO: a new vector-based semantic similarity measure including annotation originDevignes Marie-DominiqueNapoli AmedeoPoch OlivierSmail-Tabbone MalikaBenabderrahmane Sidahmed<p>Abstract</p> <p>Background</p> <p>The Gene Ontology (GO) is a well known controlled vocabulary describing the <it>biological process</it>, <it>molecular function </it>and <it>cellular component </it>aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (<it>i.e</it>. their evidence codes).</p> <p>Results</p> <p>We present here a new semantic similarity measure called <it>IntelliGO </it>which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The <it>IntelliGO </it>similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO <it>biological process </it>and <it>molecular function </it>terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the <it>IntelliGO </it>similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the <it>IntelliGO </it>similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures.</p> <p>Conclusions</p> <p>The <it>IntelliGO </it>similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering.</p> <p>Availability</p> <p>An on-line version of the <it>IntelliGO </it>similarity measure is available at: <url>http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/</url></p> http://www.biomedcentral.com/1471-2105/11/588
collection DOAJ
language English
format Article
sources DOAJ
author Devignes Marie-Dominique
Napoli Amedeo
Poch Olivier
Smail-Tabbone Malika
Benabderrahmane Sidahmed
spellingShingle Devignes Marie-Dominique
Napoli Amedeo
Poch Olivier
Smail-Tabbone Malika
Benabderrahmane Sidahmed
IntelliGO: a new vector-based semantic similarity measure including annotation origin
BMC Bioinformatics
author_facet Devignes Marie-Dominique
Napoli Amedeo
Poch Olivier
Smail-Tabbone Malika
Benabderrahmane Sidahmed
author_sort Devignes Marie-Dominique
title IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_short IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_full IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_fullStr IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_full_unstemmed IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_sort intelligo: a new vector-based semantic similarity measure including annotation origin
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2010-12-01
description <p>Abstract</p> <p>Background</p> <p>The Gene Ontology (GO) is a well known controlled vocabulary describing the <it>biological process</it>, <it>molecular function </it>and <it>cellular component </it>aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (<it>i.e</it>. their evidence codes).</p> <p>Results</p> <p>We present here a new semantic similarity measure called <it>IntelliGO </it>which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The <it>IntelliGO </it>similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO <it>biological process </it>and <it>molecular function </it>terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the <it>IntelliGO </it>similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the <it>IntelliGO </it>similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures.</p> <p>Conclusions</p> <p>The <it>IntelliGO </it>similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering.</p> <p>Availability</p> <p>An on-line version of the <it>IntelliGO </it>similarity measure is available at: <url>http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/</url></p>
url http://www.biomedcentral.com/1471-2105/11/588
work_keys_str_mv AT devignesmariedominique intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT napoliamedeo intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT pocholivier intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT smailtabbonemalika intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT benabderrahmanesidahmed intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
_version_ 1716764365628112896