On the effects of large-scale transcriptomics datasets on gene functional analyses

The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. In this thesis we show that using such large collections could ham...

Full description

Bibliographic Details
Main Author: Bhat, Prajwal
Published: Royal Holloway, University of London 2012
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553731
id ndltd-bl.uk-oai-ethos.bl.uk-553731
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5537312015-03-20T04:31:18ZOn the effects of large-scale transcriptomics datasets on gene functional analysesBhat, Prajwal2012The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. In this thesis we show that using such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. The study begins by discussing the basic principles underlying the definition of gene function and the use of large microarray collections for GBA based gene function analyses. We look at the effects of condition specific gene expression on GBA analyses and provide a mathematical and biological perspective. We show that using large microarray collections to calculate correlation can mask the effectiveness of the GBA principle. We suggest that using only those experiments that are relevant to the biological function under analysis can significantly improve GBA based gene functional analyses. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. The algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for Yeast and Arabidopsis. We demonstrate that: (i) using the selected experiments there is a statistically significant improvement both in correlation between genes in the functional category of interest and in GBA based function predictions; (ii) the effectiveness of the selected experiments increases with annotation specificity; (iii) our algorithm can be successfully applied to GBA based pathway reconstruction. We conclude by discussing the potential applications of our technique. We outline several developments that could be implemented in the future to improve the efficiency of the experiment selection procedure.572.865Royal Holloway, University of Londonhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553731http://repository.royalholloway.ac.uk/items/f63c7106-2a4f-6437-210a-2f8829b7f61a/9/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 572.865
spellingShingle 572.865
Bhat, Prajwal
On the effects of large-scale transcriptomics datasets on gene functional analyses
description The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. In this thesis we show that using such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. The study begins by discussing the basic principles underlying the definition of gene function and the use of large microarray collections for GBA based gene function analyses. We look at the effects of condition specific gene expression on GBA analyses and provide a mathematical and biological perspective. We show that using large microarray collections to calculate correlation can mask the effectiveness of the GBA principle. We suggest that using only those experiments that are relevant to the biological function under analysis can significantly improve GBA based gene functional analyses. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. The algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for Yeast and Arabidopsis. We demonstrate that: (i) using the selected experiments there is a statistically significant improvement both in correlation between genes in the functional category of interest and in GBA based function predictions; (ii) the effectiveness of the selected experiments increases with annotation specificity; (iii) our algorithm can be successfully applied to GBA based pathway reconstruction. We conclude by discussing the potential applications of our technique. We outline several developments that could be implemented in the future to improve the efficiency of the experiment selection procedure.
author Bhat, Prajwal
author_facet Bhat, Prajwal
author_sort Bhat, Prajwal
title On the effects of large-scale transcriptomics datasets on gene functional analyses
title_short On the effects of large-scale transcriptomics datasets on gene functional analyses
title_full On the effects of large-scale transcriptomics datasets on gene functional analyses
title_fullStr On the effects of large-scale transcriptomics datasets on gene functional analyses
title_full_unstemmed On the effects of large-scale transcriptomics datasets on gene functional analyses
title_sort on the effects of large-scale transcriptomics datasets on gene functional analyses
publisher Royal Holloway, University of London
publishDate 2012
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553731
work_keys_str_mv AT bhatprajwal ontheeffectsoflargescaletranscriptomicsdatasetsongenefunctionalanalyses
_version_ 1716785546064297984