Automated Prediction of Human Disease Genes

The completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise...

Full description

Bibliographic Details
Main Author: Blom, Martin
Format: Others
Language:en_US
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/2152/19529
id ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-19529
record_format oai_dc
spelling ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-195292015-09-20T17:13:50ZAutomated Prediction of Human Disease GenesBlom, MartinBioinformaticsSystems biologyThe completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. In the first part of this work, I resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. I demonstrate a significant boost in the power to detect validated candidate genes for Crohn’s disease and type 2 diabetes by comparing the predictions from my method to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK--STAT pathway and associated adaptors GRB2/SHC1 in Crohn’s disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes in GWAS-based studies. Furthermore, functional gene networks are not the only kind of information that can be used to predict gene--phenotype associations. In the second part of this thesis, I show that gene-phenotype associations in model species from species as distantly related to humans as E. coli is another valuable source of information, that can be mined using methods similar to those used in recommender systems. Finally, in the last part of this thesis, I present a machine learning formalism that combines the functional gene network and model species phenotype information. I show that this approach outperforms the state of the art methods for gene-phenotype association prediction using cross-validation.text2013-02-21T21:49:26Z2012-122012-12-07December 20122013-02-21T21:49:27Zapplication/pdfhttp://hdl.handle.net/2152/19529en_US
collection NDLTD
language en_US
format Others
sources NDLTD
topic Bioinformatics
Systems biology
spellingShingle Bioinformatics
Systems biology
Blom, Martin
Automated Prediction of Human Disease Genes
description The completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. In the first part of this work, I resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. I demonstrate a significant boost in the power to detect validated candidate genes for Crohn’s disease and type 2 diabetes by comparing the predictions from my method to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK--STAT pathway and associated adaptors GRB2/SHC1 in Crohn’s disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes in GWAS-based studies. Furthermore, functional gene networks are not the only kind of information that can be used to predict gene--phenotype associations. In the second part of this thesis, I show that gene-phenotype associations in model species from species as distantly related to humans as E. coli is another valuable source of information, that can be mined using methods similar to those used in recommender systems. Finally, in the last part of this thesis, I present a machine learning formalism that combines the functional gene network and model species phenotype information. I show that this approach outperforms the state of the art methods for gene-phenotype association prediction using cross-validation. === text
author Blom, Martin
author_facet Blom, Martin
author_sort Blom, Martin
title Automated Prediction of Human Disease Genes
title_short Automated Prediction of Human Disease Genes
title_full Automated Prediction of Human Disease Genes
title_fullStr Automated Prediction of Human Disease Genes
title_full_unstemmed Automated Prediction of Human Disease Genes
title_sort automated prediction of human disease genes
publishDate 2013
url http://hdl.handle.net/2152/19529
work_keys_str_mv AT blommartin automatedpredictionofhumandiseasegenes
_version_ 1716823006005690368