Accurate reconstruction of insertion-deletion histories by statistical phylogenetics.
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for fi...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2012-01-01
|
Series: | PLoS ONE |
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22536326/pdf/?tool=EBI |
id |
doaj-3eade5f5b5fa4ed2a190e9ca63d4a7c6 |
---|---|
record_format |
Article |
spelling |
doaj-3eade5f5b5fa4ed2a190e9ca63d4a7c62021-03-03T20:29:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0174e3457210.1371/journal.pone.0034572Accurate reconstruction of insertion-deletion histories by statistical phylogenetics.Oscar WestessonGerton LunterBenedict PatenIan HolmesThe Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22536326/pdf/?tool=EBI |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Oscar Westesson Gerton Lunter Benedict Paten Ian Holmes |
spellingShingle |
Oscar Westesson Gerton Lunter Benedict Paten Ian Holmes Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS ONE |
author_facet |
Oscar Westesson Gerton Lunter Benedict Paten Ian Holmes |
author_sort |
Oscar Westesson |
title |
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
title_short |
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
title_full |
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
title_fullStr |
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
title_full_unstemmed |
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
title_sort |
accurate reconstruction of insertion-deletion histories by statistical phylogenetics. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2012-01-01 |
description |
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes. |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22536326/pdf/?tool=EBI |
work_keys_str_mv |
AT oscarwestesson accuratereconstructionofinsertiondeletionhistoriesbystatisticalphylogenetics AT gertonlunter accuratereconstructionofinsertiondeletionhistoriesbystatisticalphylogenetics AT benedictpaten accuratereconstructionofinsertiondeletionhistoriesbystatisticalphylogenetics AT ianholmes accuratereconstructionofinsertiondeletionhistoriesbystatisticalphylogenetics |
_version_ |
1714822245880168448 |