Statistical HLA type imputation from large and heterogeneous datasets
An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate...
Main Author: | |
---|---|
Other Authors: | |
Published: |
University of Oxford
2012
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-572468 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5724682015-03-20T04:37:14ZStatistical HLA type imputation from large and heterogeneous datasetsDilthey, Alexander TiloMcVean, Gil2012An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.610.0796Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population geneticsUniversity of Oxfordhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468http://ora.ox.ac.uk/objects/uuid:1bca18bf-b9d5-4777-b58e-a0dca4c9dbeaElectronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
610.0796 Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population genetics |
spellingShingle |
610.0796 Genetics (life sciences) : Bioinformatics (life sciences) : Immunodiagnostics : Immunology : Mathematical genetics and bioinformatics (statistics) : Statistics (see also social sciences) : Human Leukocyte Antigen : major histocompatibility complex : imputation : prediction : autoimmune : immunology : graph : population genetics Dilthey, Alexander Tilo Statistical HLA type imputation from large and heterogeneous datasets |
description |
An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02. |
author2 |
McVean, Gil |
author_facet |
McVean, Gil Dilthey, Alexander Tilo |
author |
Dilthey, Alexander Tilo |
author_sort |
Dilthey, Alexander Tilo |
title |
Statistical HLA type imputation from large and heterogeneous datasets |
title_short |
Statistical HLA type imputation from large and heterogeneous datasets |
title_full |
Statistical HLA type imputation from large and heterogeneous datasets |
title_fullStr |
Statistical HLA type imputation from large and heterogeneous datasets |
title_full_unstemmed |
Statistical HLA type imputation from large and heterogeneous datasets |
title_sort |
statistical hla type imputation from large and heterogeneous datasets |
publisher |
University of Oxford |
publishDate |
2012 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572468 |
work_keys_str_mv |
AT diltheyalexandertilo statisticalhlatypeimputationfromlargeandheterogeneousdatasets |
_version_ |
1716785802995826688 |