Automated Prediction of Human Disease Genes

The completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise...

Full description

Bibliographic Details
Main Author:	Blom, Martin
Format:	Others
Language:	en_US
Published:	2013
Subjects:	Bioinformatics Systems biology
Online Access:	http://hdl.handle.net/2152/19529

id	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-19529
record_format	oai_dc
spelling	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-195292015-09-20T17:13:50ZAutomated Prediction of Human Disease GenesBlom, MartinBioinformaticsSystems biologyThe completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. In the first part of this work, I resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. I demonstrate a significant boost in the power to detect validated candidate genes for Crohn’s disease and type 2 diabetes by comparing the predictions from my method to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK--STAT pathway and associated adaptors GRB2/SHC1 in Crohn’s disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes in GWAS-based studies. Furthermore, functional gene networks are not the only kind of information that can be used to predict gene--phenotype associations. In the second part of this thesis, I show that gene-phenotype associations in model species from species as distantly related to humans as E. coli is another valuable source of information, that can be mined using methods similar to those used in recommender systems. Finally, in the last part of this thesis, I present a machine learning formalism that combines the functional gene network and model species phenotype information. I show that this approach outperforms the state of the art methods for gene-phenotype association prediction using cross-validation.text2013-02-21T21:49:26Z2012-122012-12-07December 20122013-02-21T21:49:27Zapplication/pdfhttp://hdl.handle.net/2152/19529en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Bioinformatics Systems biology
spellingShingle	Bioinformatics Systems biology Blom, Martin Automated Prediction of Human Disease Genes
description	The completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. In the first part of this work, I resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. I demonstrate a significant boost in the power to detect validated candidate genes for Crohn’s disease and type 2 diabetes by comparing the predictions from my method to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK--STAT pathway and associated adaptors GRB2/SHC1 in Crohn’s disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes in GWAS-based studies. Furthermore, functional gene networks are not the only kind of information that can be used to predict gene--phenotype associations. In the second part of this thesis, I show that gene-phenotype associations in model species from species as distantly related to humans as E. coli is another valuable source of information, that can be mined using methods similar to those used in recommender systems. Finally, in the last part of this thesis, I present a machine learning formalism that combines the functional gene network and model species phenotype information. I show that this approach outperforms the state of the art methods for gene-phenotype association prediction using cross-validation. === text
author	Blom, Martin
author_facet	Blom, Martin
author_sort	Blom, Martin
title	Automated Prediction of Human Disease Genes
title_short	Automated Prediction of Human Disease Genes
title_full	Automated Prediction of Human Disease Genes
title_fullStr	Automated Prediction of Human Disease Genes
title_full_unstemmed	Automated Prediction of Human Disease Genes
title_sort	automated prediction of human disease genes
publishDate	2013
url	http://hdl.handle.net/2152/19529
work_keys_str_mv	AT blommartin automatedpredictionofhumandiseasegenes
_version_	1716823006005690368

Automated Prediction of Human Disease Genes

Similar Items