Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and tr...

Full description

Bibliographic Details
Main Authors: Don Klinkenberg, Jantien A Backer, Xavier Didelot, Caroline Colijn, Jacco Wallinga
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-05-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1005495
id doaj-15a8173015f6455fa587bee9083e1670
record_format Article
spelling doaj-15a8173015f6455fa587bee9083e16702021-04-21T15:43:07ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582017-05-01135e100549510.1371/journal.pcbi.1005495Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.Don KlinkenbergJantien A BackerXavier DidelotCaroline ColijnJacco WallingaWhole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.https://doi.org/10.1371/journal.pcbi.1005495
collection DOAJ
language English
format Article
sources DOAJ
author Don Klinkenberg
Jantien A Backer
Xavier Didelot
Caroline Colijn
Jacco Wallinga
spellingShingle Don Klinkenberg
Jantien A Backer
Xavier Didelot
Caroline Colijn
Jacco Wallinga
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
PLoS Computational Biology
author_facet Don Klinkenberg
Jantien A Backer
Xavier Didelot
Caroline Colijn
Jacco Wallinga
author_sort Don Klinkenberg
title Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
title_short Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
title_full Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
title_fullStr Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
title_full_unstemmed Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
title_sort simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2017-05-01
description Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
url https://doi.org/10.1371/journal.pcbi.1005495
work_keys_str_mv AT donklinkenberg simultaneousinferenceofphylogeneticandtransmissiontreesininfectiousdiseaseoutbreaks
AT jantienabacker simultaneousinferenceofphylogeneticandtransmissiontreesininfectiousdiseaseoutbreaks
AT xavierdidelot simultaneousinferenceofphylogeneticandtransmissiontreesininfectiousdiseaseoutbreaks
AT carolinecolijn simultaneousinferenceofphylogeneticandtransmissiontreesininfectiousdiseaseoutbreaks
AT jaccowallinga simultaneousinferenceofphylogeneticandtransmissiontreesininfectiousdiseaseoutbreaks
_version_ 1714667104402145280