Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
<p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2011-01-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/12/5 |
id |
doaj-53b96299a1a34c189404e45c52be1174 |
---|---|
record_format |
Article |
spelling |
doaj-53b96299a1a34c189404e45c52be11742020-11-24T21:58:24ZengBMCBMC Bioinformatics1471-21052011-01-01121510.1186/1471-2105-12-5Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencingVincenti DonatellaRozera GabriellaAbbate IsabellaBruselles AlessandroProsperi LucianoProsperi Mattia CFSolmone MariaCapobianchi MariaUlivi Giovanni<p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called <it>in-silico </it>recombinants.</p> <p>Results</p> <p>The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain <it>in-silico </it>recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.</p> <p>Conclusions</p> <p>The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.</p> http://www.biomedcentral.com/1471-2105/12/5 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Vincenti Donatella Rozera Gabriella Abbate Isabella Bruselles Alessandro Prosperi Luciano Prosperi Mattia CF Solmone Maria Capobianchi Maria Ulivi Giovanni |
spellingShingle |
Vincenti Donatella Rozera Gabriella Abbate Isabella Bruselles Alessandro Prosperi Luciano Prosperi Mattia CF Solmone Maria Capobianchi Maria Ulivi Giovanni Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing BMC Bioinformatics |
author_facet |
Vincenti Donatella Rozera Gabriella Abbate Isabella Bruselles Alessandro Prosperi Luciano Prosperi Mattia CF Solmone Maria Capobianchi Maria Ulivi Giovanni |
author_sort |
Vincenti Donatella |
title |
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
title_short |
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
title_full |
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
title_fullStr |
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
title_full_unstemmed |
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
title_sort |
combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2011-01-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called <it>in-silico </it>recombinants.</p> <p>Results</p> <p>The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain <it>in-silico </it>recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.</p> <p>Conclusions</p> <p>The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.</p> |
url |
http://www.biomedcentral.com/1471-2105/12/5 |
work_keys_str_mv |
AT vincentidonatella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT rozeragabriella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT abbateisabella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT brusellesalessandro combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT prosperiluciano combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT prosperimattiacf combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT solmonemaria combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT capobianchimaria combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing AT ulivigiovanni combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing |
_version_ |
1725852075144773632 |