Microbial comparative pan-genomics using binomial mixture models

<p>Abstract</p> <p>Background</p> <p>The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made,...

Full description

Bibliographic Details
Main Authors: Ussery David W, Almøy Trygve, Snipen Lars
Format: Article
Language:English
Published: BMC 2009-08-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/10/385
id doaj-0132a9839d7b4e36959107a1254d6b32
record_format Article
spelling doaj-0132a9839d7b4e36959107a1254d6b322020-11-25T00:32:48ZengBMCBMC Genomics1471-21642009-08-0110138510.1186/1471-2164-10-385Microbial comparative pan-genomics using binomial mixture modelsUssery David WAlmøy TrygveSnipen Lars<p>Abstract</p> <p>Background</p> <p>The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology.</p> <p>Results</p> <p>We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in <it>Buchnera aphidicola </it>to large (around 43000 gene families) in <it>Escherichia coli</it>. Results for <it>Echerichia coli </it>show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population.</p> <p>Conclusion</p> <p>Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.</p> http://www.biomedcentral.com/1471-2164/10/385
collection DOAJ
language English
format Article
sources DOAJ
author Ussery David W
Almøy Trygve
Snipen Lars
spellingShingle Ussery David W
Almøy Trygve
Snipen Lars
Microbial comparative pan-genomics using binomial mixture models
BMC Genomics
author_facet Ussery David W
Almøy Trygve
Snipen Lars
author_sort Ussery David W
title Microbial comparative pan-genomics using binomial mixture models
title_short Microbial comparative pan-genomics using binomial mixture models
title_full Microbial comparative pan-genomics using binomial mixture models
title_fullStr Microbial comparative pan-genomics using binomial mixture models
title_full_unstemmed Microbial comparative pan-genomics using binomial mixture models
title_sort microbial comparative pan-genomics using binomial mixture models
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2009-08-01
description <p>Abstract</p> <p>Background</p> <p>The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology.</p> <p>Results</p> <p>We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in <it>Buchnera aphidicola </it>to large (around 43000 gene families) in <it>Escherichia coli</it>. Results for <it>Echerichia coli </it>show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population.</p> <p>Conclusion</p> <p>Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.</p>
url http://www.biomedcentral.com/1471-2164/10/385
work_keys_str_mv AT usserydavidw microbialcomparativepangenomicsusingbinomialmixturemodels
AT almøytrygve microbialcomparativepangenomicsusingbinomialmixturemodels
AT snipenlars microbialcomparativepangenomicsusingbinomialmixturemodels
_version_ 1725318979226959872