Bayes-optimal estimation of overlap between populations of fixed size.

Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population...

Full description

Bibliographic Details
Main Author: Daniel B Larremore
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-03-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6440621?pdf=render
id doaj-a911da2e3eed4a53a0e257ce5430528a
record_format Article
spelling doaj-a911da2e3eed4a53a0e257ce5430528a2020-11-25T01:46:02ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-03-01153e100689810.1371/journal.pcbi.1006898Bayes-optimal estimation of overlap between populations of fixed size.Daniel B LarremoreMeasuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population's objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.http://europepmc.org/articles/PMC6440621?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Daniel B Larremore
spellingShingle Daniel B Larremore
Bayes-optimal estimation of overlap between populations of fixed size.
PLoS Computational Biology
author_facet Daniel B Larremore
author_sort Daniel B Larremore
title Bayes-optimal estimation of overlap between populations of fixed size.
title_short Bayes-optimal estimation of overlap between populations of fixed size.
title_full Bayes-optimal estimation of overlap between populations of fixed size.
title_fullStr Bayes-optimal estimation of overlap between populations of fixed size.
title_full_unstemmed Bayes-optimal estimation of overlap between populations of fixed size.
title_sort bayes-optimal estimation of overlap between populations of fixed size.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2019-03-01
description Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population's objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.
url http://europepmc.org/articles/PMC6440621?pdf=render
work_keys_str_mv AT danielblarremore bayesoptimalestimationofoverlapbetweenpopulationsoffixedsize
_version_ 1725021067054940160