Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.

The estimation of population allele frequencies using sample data forms a central component of studies in population genetics. These estimates can be used to test hypotheses on the evolutionary processes governing changes in genetic variation among populations. However, existing studies frequently d...

Full description

Bibliographic Details
Main Authors: Tak Fung, Kevin Keenan
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3897575?pdf=render
id doaj-e2cc5d5883e04898b3cb5bfec356d43b
record_format Article
spelling doaj-e2cc5d5883e04898b3cb5bfec356d43b2020-11-25T01:19:29ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0191e8592510.1371/journal.pone.0085925Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.Tak FungKevin KeenanThe estimation of population allele frequencies using sample data forms a central component of studies in population genetics. These estimates can be used to test hypotheses on the evolutionary processes governing changes in genetic variation among populations. However, existing studies frequently do not account for sampling uncertainty in these estimates, thus compromising their utility. Incorporation of this uncertainty has been hindered by the lack of a method for constructing confidence intervals containing the population allele frequencies, for the general case of sampling from a finite diploid population of any size. In this study, we address this important knowledge gap by presenting a rigorous mathematical method to construct such confidence intervals. For a range of scenarios, the method is used to demonstrate that for a particular allele, in order to obtain accurate estimates within 0.05 of the population allele frequency with high probability (> or = 95%), a sample size of > 30 is often required. This analysis is augmented by an application of the method to empirical sample allele frequency data for two populations of the checkerspot butterfly (Melitaea cinxia L.), occupying meadows in Finland. For each population, the method is used to derive > or = 98.3% confidence intervals for the population frequencies of three alleles. These intervals are then used to construct two joint > or = 95% confidence regions, one for the set of three frequencies for each population. These regions are then used to derive a > or = 95%% confidence interval for Jost's D, a measure of genetic differentiation between the two populations. Overall, the results demonstrate the practical utility of the method with respect to informing sampling design and accounting for sampling uncertainty in studies of population genetics, important for scientific hypothesis-testing and also for risk-based natural resource management.http://europepmc.org/articles/PMC3897575?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Tak Fung
Kevin Keenan
spellingShingle Tak Fung
Kevin Keenan
Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
PLoS ONE
author_facet Tak Fung
Kevin Keenan
author_sort Tak Fung
title Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
title_short Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
title_full Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
title_fullStr Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
title_full_unstemmed Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
title_sort confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description The estimation of population allele frequencies using sample data forms a central component of studies in population genetics. These estimates can be used to test hypotheses on the evolutionary processes governing changes in genetic variation among populations. However, existing studies frequently do not account for sampling uncertainty in these estimates, thus compromising their utility. Incorporation of this uncertainty has been hindered by the lack of a method for constructing confidence intervals containing the population allele frequencies, for the general case of sampling from a finite diploid population of any size. In this study, we address this important knowledge gap by presenting a rigorous mathematical method to construct such confidence intervals. For a range of scenarios, the method is used to demonstrate that for a particular allele, in order to obtain accurate estimates within 0.05 of the population allele frequency with high probability (> or = 95%), a sample size of > 30 is often required. This analysis is augmented by an application of the method to empirical sample allele frequency data for two populations of the checkerspot butterfly (Melitaea cinxia L.), occupying meadows in Finland. For each population, the method is used to derive > or = 98.3% confidence intervals for the population frequencies of three alleles. These intervals are then used to construct two joint > or = 95% confidence regions, one for the set of three frequencies for each population. These regions are then used to derive a > or = 95%% confidence interval for Jost's D, a measure of genetic differentiation between the two populations. Overall, the results demonstrate the practical utility of the method with respect to informing sampling design and accounting for sampling uncertainty in studies of population genetics, important for scientific hypothesis-testing and also for risk-based natural resource management.
url http://europepmc.org/articles/PMC3897575?pdf=render
work_keys_str_mv AT takfung confidenceintervalsforpopulationallelefrequenciesthegeneralcaseofsamplingfromafinitediploidpopulationofanysize
AT kevinkeenan confidenceintervalsforpopulationallelefrequenciesthegeneralcaseofsamplingfromafinitediploidpopulationofanysize
_version_ 1725137938874892288