Indirect two-sided relative ranking: a robust similarity measure for gene expression data

<p>Abstract</p> <p>Background</p> <p>There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented research...

Full description

Bibliographic Details
Main Authors: Licamele Louis, Getoor Lise
Format: Article
Language:English
Published: BMC 2010-03-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/137
id doaj-451324f2fe0b493cae577c25cee94ae2
record_format Article
spelling doaj-451324f2fe0b493cae577c25cee94ae22020-11-24T23:57:48ZengBMCBMC Bioinformatics1471-21052010-03-0111113710.1186/1471-2105-11-137Indirect two-sided relative ranking: a robust similarity measure for gene expression dataLicamele LouisGetoor Lise<p>Abstract</p> <p>Background</p> <p>There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights.</p> <p>Results</p> <p>In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries.</p> <p>Conclusions</p> <p>We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.</p> http://www.biomedcentral.com/1471-2105/11/137
collection DOAJ
language English
format Article
sources DOAJ
author Licamele Louis
Getoor Lise
spellingShingle Licamele Louis
Getoor Lise
Indirect two-sided relative ranking: a robust similarity measure for gene expression data
BMC Bioinformatics
author_facet Licamele Louis
Getoor Lise
author_sort Licamele Louis
title Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_short Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_full Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_fullStr Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_full_unstemmed Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_sort indirect two-sided relative ranking: a robust similarity measure for gene expression data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2010-03-01
description <p>Abstract</p> <p>Background</p> <p>There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights.</p> <p>Results</p> <p>In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries.</p> <p>Conclusions</p> <p>We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.</p>
url http://www.biomedcentral.com/1471-2105/11/137
work_keys_str_mv AT licamelelouis indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata
AT getoorlise indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata
_version_ 1725453169048158208