Ancestry inference using reference labeled clusters of haplotypes

Background: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from refe...

Full description

Bibliographic Details
Main Authors: Ball, C.A (Author), Byrnes, J.K (Author), Hong, E.L (Author), Noto, K. (Author), Schraiber, J.G (Author), Sedghifar, A. (Author), Song, S. (Author), Turissini, D.A (Author), Wang, Y. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
HMM
Online Access:View Fulltext in Publisher
LEADER 02800nam a2200637Ia 4500
001 10.1186-s12859-021-04350-x
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Ancestry inference using reference labeled clusters of haplotypes 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04350-x 
520 3 |a Background: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. Results: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. Conclusions: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture. © 2021, The Author(s). 
650 0 4 |a Admixture 
650 0 4 |a Ancestry inference 
650 0 4 |a Ancestry inference 
650 0 4 |a ARCHe 
650 0 4 |a Arches 
650 0 4 |a ARCHes 
650 0 4 |a article 
650 0 4 |a cohort analysis 
650 0 4 |a Genes 
650 0 4 |a Genetics, Population 
650 0 4 |a Genome, Human 
650 0 4 |a haplotype 
650 0 4 |a haplotype 
650 0 4 |a Haplotype diversity 
650 0 4 |a Haplotype modeling 
650 0 4 |a Haplotype modeling 
650 0 4 |a Haplotypes 
650 0 4 |a Haplotypes 
650 0 4 |a HMM 
650 0 4 |a HMM 
650 0 4 |a human 
650 0 4 |a human genome 
650 0 4 |a human genome 
650 0 4 |a Humans 
650 0 4 |a Local ancestry 
650 0 4 |a Local ancestry 
650 0 4 |a Polymorphism, Single Nucleotide 
650 0 4 |a population genetics 
650 0 4 |a RFMix 
650 0 4 |a RFMix 
650 0 4 |a running 
650 0 4 |a Running time 
650 0 4 |a simulation 
650 0 4 |a single nucleotide polymorphism 
700 1 |a Ball, C.A.  |e author 
700 1 |a Byrnes, J.K.  |e author 
700 1 |a Hong, E.L.  |e author 
700 1 |a Noto, K.  |e author 
700 1 |a Schraiber, J.G.  |e author 
700 1 |a Sedghifar, A.  |e author 
700 1 |a Song, S.  |e author 
700 1 |a Turissini, D.A.  |e author 
700 1 |a Wang, Y.  |e author 
773 |t BMC Bioinformatics