A stable and robust method to identify modules of functionally coherent genes

Complex cellular functions are carried out by the coordinated activity of networks of genes and gene products. In order to understand mechanisms of disease and disease pathogenesis, it is crucial to develop an understanding of these complex interactions. Microarrays provide the potential to explor...

Full description

Bibliographic Details
Main Author: Takhar, Mandeep Kaur
Language:English
Published: University of British Columbia 2014
Online Access:http://hdl.handle.net/2429/50675
Description
Summary:Complex cellular functions are carried out by the coordinated activity of networks of genes and gene products. In order to understand mechanisms of disease and disease pathogenesis, it is crucial to develop an understanding of these complex interactions. Microarrays provide the potential to explore large scale cellular networks by measuring the expression of thousands of genes simultaneously. The purpose of our project is to develop a stable and robust method that can identify, from such gene expression data, modules of genes that are involved in a common functional role. These modules can be used as a first step in systems scale analyses to extract valuable information from various gene expression studies. Our method constructs modules by identifying genes that are co-expressed across many diseases. We use peripheral blood microarray samples from patients having one of several diseases and cluster the genes in each disease group separately. We then identify genes that cluster together across all disease groups to construct our modules. We first use our method to construct baseline peripheral blood modules relevant to the lung using 5 groups of peripheral blood microarray samples that were collected as controls for separate studies. An enrichment analysis using gene sets from a number of pathway and ontology databases reveals the biological significance of our modules. We utilize our background modules by doing an enrichment analysis on a list of genes that were differentially expressed in a COPD case vs. control study and identify modules that are enriched in that list. Although a similar approach has been used to identify modules of genes that are coordinately expressed across multiple conditions, we show that our method is an improvement as it is robust to the order in which the different disease datasets are presented to the algorithm. We also apply our procedure to 3 different datasets including a COPD dataset, a COPD normal dataset and a lung tissue dataset. We then assess the stability of our method by performing a resampling experiment on our module construction procedure and find that our method repeatedly produces modules with high concordance which is measured by Jaccard distance. === Science, Faculty of === Computer Science, Department of === Graduate