Statistical HLA type imputation from large and heterogeneous datasets

An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate...

Full description

Bibliographic Details
Main Author: Dilthey, Alexander Tilo
Other Authors: McVean, Gil
Published: University of Oxford 2012
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468
id ndltd-bl.uk-oai-ethos.bl.uk-572468
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5724682015-03-20T04:37:14ZStatistical HLA type imputation from large and heterogeneous datasetsDilthey, Alexander TiloMcVean, Gil2012An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.610.0796Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population geneticsUniversity of Oxfordhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468http://ora.ox.ac.uk/objects/uuid:1bca18bf-b9d5-4777-b58e-a0dca4c9dbeaElectronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 610.0796
Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population genetics
spellingShingle 610.0796
Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population genetics
Dilthey, Alexander Tilo
Statistical HLA type imputation from large and heterogeneous datasets
description An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.
author2 McVean, Gil
author_facet McVean, Gil
Dilthey, Alexander Tilo
author Dilthey, Alexander Tilo
author_sort Dilthey, Alexander Tilo
title Statistical HLA type imputation from large and heterogeneous datasets
title_short Statistical HLA type imputation from large and heterogeneous datasets
title_full Statistical HLA type imputation from large and heterogeneous datasets
title_fullStr Statistical HLA type imputation from large and heterogeneous datasets
title_full_unstemmed Statistical HLA type imputation from large and heterogeneous datasets
title_sort statistical hla type imputation from large and heterogeneous datasets
publisher University of Oxford
publishDate 2012
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468
work_keys_str_mv AT diltheyalexandertilo statisticalhlatypeimputationfromlargeandheterogeneousdatasets
_version_ 1716785802995826688