Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization....

Full description

Bibliographic Details
Main Authors: Barbara E Engelhardt, Matthew Stephens
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-09-01
Series:PLoS Genetics
Online Access:http://europepmc.org/articles/PMC2940725?pdf=render
id doaj-548b785ef30a4fdbb9550bcb03252173
record_format Article
spelling doaj-548b785ef30a4fdbb9550bcb032521732020-11-25T00:02:21ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042010-09-0169e100111710.1371/journal.pgen.1001117Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.Barbara E EngelhardtMatthew StephensWe consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.http://europepmc.org/articles/PMC2940725?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Barbara E Engelhardt
Matthew Stephens
spellingShingle Barbara E Engelhardt
Matthew Stephens
Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
PLoS Genetics
author_facet Barbara E Engelhardt
Matthew Stephens
author_sort Barbara E Engelhardt
title Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
title_short Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
title_full Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
title_fullStr Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
title_full_unstemmed Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
title_sort analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.
publisher Public Library of Science (PLoS)
series PLoS Genetics
issn 1553-7390
1553-7404
publishDate 2010-09-01
description We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.
url http://europepmc.org/articles/PMC2940725?pdf=render
work_keys_str_mv AT barbaraeengelhardt analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis
AT matthewstephens analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis
_version_ 1725438201063014400