A network-based analysis of the cellular and genetic etiology of disease

Thousands of disease-associated loci have been identified in genome-wide association studies (GWAS). These loci can span multiple genes and identifying which, if any, of these genes are causal can be challenging. Multiple methods have been developed to identify causal genes, some of which use networ...

Full description

Bibliographic Details
Main Author: Cornish, Alexander John
Other Authors: Sternberg, Michael
Published: Imperial College London 2016
Subjects:
570
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.754669
Description
Summary:Thousands of disease-associated loci have been identified in genome-wide association studies (GWAS). These loci can span multiple genes and identifying which, if any, of these genes are causal can be challenging. Multiple methods have been developed to identify causal genes, some of which use networks of physical interactions between proteins. The performance of many of these methods may however be limited by their failure to use data specific to the tissues and cell types that manifest each disease. Furthermore, many network-based approaches may be biased towards better-studied genes. In order to use data specific to a disease-manifesting cell type to identify disease- associated genes, it is first necessary to identify the disease-manifesting cell types. In this thesis, I report the development of the GSC (Gene Set Compactness) and GSO (Gene Set Overexpression) methods, which I use to identify associations between 352 diseases and 73 cell types. The GSC method identifies these associations using cell-type-specific protein-protein interaction (PPI) networks, which I generate by integrating PPI and gene expression data. Using text mining, it is demonstrated that these methods identify a large number of well-characterised disease-cell-type associations and associations that warrant further investigation. I also describe the development of ALPACA (Analysing Loci using Phenotypic And Cellular Associations), which identifies disease-associated genes using cell-type- specific PPI networks and phenotype data from humans and mice. I demonstrate that by taking a permutation-based approach, ALPACA avoids being biased towards better-studied genes. Furthermore, I demonstrate that using cell-type-specific networks, instead of generic networks, improves method performance. As the number of available tissue and cell-type-specific data continues to increase, methods that integrate these data will become increasingly important in understanding disease etiology.