Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-ba...

Full description

Bibliographic Details
Main Authors: Xiaomei Wu, Erli Pang, Kui Lin, Zhen-Ming Pei
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3669204?pdf=render
id doaj-45d188542b244b16b2c8f238beb11551
record_format Article
spelling doaj-45d188542b244b16b2c8f238beb115512020-11-25T01:20:00ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0185e6674510.1371/journal.pone.0066745Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.Xiaomei WuErli PangKui LinZhen-Ming PeiBACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC). RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS). HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.http://europepmc.org/articles/PMC3669204?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Xiaomei Wu
Erli Pang
Kui Lin
Zhen-Ming Pei
spellingShingle Xiaomei Wu
Erli Pang
Kui Lin
Zhen-Ming Pei
Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
PLoS ONE
author_facet Xiaomei Wu
Erli Pang
Kui Lin
Zhen-Ming Pei
author_sort Xiaomei Wu
title Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
title_short Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
title_full Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
title_fullStr Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
title_full_unstemmed Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.
title_sort improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and ic-based hybrid method.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC). RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS). HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.
url http://europepmc.org/articles/PMC3669204?pdf=render
work_keys_str_mv AT xiaomeiwu improvingthemeasurementofsemanticsimilaritybetweengeneontologytermsandgeneproductsinsightsfromanedgeandicbasedhybridmethod
AT erlipang improvingthemeasurementofsemanticsimilaritybetweengeneontologytermsandgeneproductsinsightsfromanedgeandicbasedhybridmethod
AT kuilin improvingthemeasurementofsemanticsimilaritybetweengeneontologytermsandgeneproductsinsightsfromanedgeandicbasedhybridmethod
AT zhenmingpei improvingthemeasurementofsemanticsimilaritybetweengeneontologytermsandgeneproductsinsightsfromanedgeandicbasedhybridmethod
_version_ 1725135919330099200