Bayes-optimal estimation of overlap between populations of fixed size.

Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population...

Full description

Bibliographic Details
Main Author:	Daniel B Larremore
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2019-03-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC6440621?pdf=render

id	doaj-a911da2e3eed4a53a0e257ce5430528a
record_format	Article
spelling	doaj-a911da2e3eed4a53a0e257ce5430528a2020-11-25T01:46:02ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-03-01153e100689810.1371/journal.pcbi.1006898Bayes-optimal estimation of overlap between populations of fixed size.Daniel B LarremoreMeasuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population's objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.http://europepmc.org/articles/PMC6440621?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Daniel B Larremore
spellingShingle	Daniel B Larremore Bayes-optimal estimation of overlap between populations of fixed size. PLoS Computational Biology
author_facet	Daniel B Larremore
author_sort	Daniel B Larremore
title	Bayes-optimal estimation of overlap between populations of fixed size.
title_short	Bayes-optimal estimation of overlap between populations of fixed size.
title_full	Bayes-optimal estimation of overlap between populations of fixed size.
title_fullStr	Bayes-optimal estimation of overlap between populations of fixed size.
title_full_unstemmed	Bayes-optimal estimation of overlap between populations of fixed size.
title_sort	bayes-optimal estimation of overlap between populations of fixed size.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2019-03-01
description	Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects-species, taxonomical units, or gene variants, depending on the context-can be directly counted. In practice, however, only a fraction of each population's objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.
url	http://europepmc.org/articles/PMC6440621?pdf=render
work_keys_str_mv	AT danielblarremore bayesoptimalestimationofoverlapbetweenpopulationsoffixedsize
_version_	1725021067054940160

Bayes-optimal estimation of overlap between populations of fixed size.

Similar Items