Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing

<p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation...

Full description

Bibliographic Details
Main Authors: Vincenti Donatella, Rozera Gabriella, Abbate Isabella, Bruselles Alessandro, Prosperi Luciano, Prosperi Mattia CF, Solmone Maria, Capobianchi Maria, Ulivi Giovanni
Format: Article
Language:English
Published: BMC 2011-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/5
id doaj-53b96299a1a34c189404e45c52be1174
record_format Article
spelling doaj-53b96299a1a34c189404e45c52be11742020-11-24T21:58:24ZengBMCBMC Bioinformatics1471-21052011-01-01121510.1186/1471-2105-12-5Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencingVincenti DonatellaRozera GabriellaAbbate IsabellaBruselles AlessandroProsperi LucianoProsperi Mattia CFSolmone MariaCapobianchi MariaUlivi Giovanni<p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called <it>in-silico </it>recombinants.</p> <p>Results</p> <p>The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain <it>in-silico </it>recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.</p> <p>Conclusions</p> <p>The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.</p> http://www.biomedcentral.com/1471-2105/12/5
collection DOAJ
language English
format Article
sources DOAJ
author Vincenti Donatella
Rozera Gabriella
Abbate Isabella
Bruselles Alessandro
Prosperi Luciano
Prosperi Mattia CF
Solmone Maria
Capobianchi Maria
Ulivi Giovanni
spellingShingle Vincenti Donatella
Rozera Gabriella
Abbate Isabella
Bruselles Alessandro
Prosperi Luciano
Prosperi Mattia CF
Solmone Maria
Capobianchi Maria
Ulivi Giovanni
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
BMC Bioinformatics
author_facet Vincenti Donatella
Rozera Gabriella
Abbate Isabella
Bruselles Alessandro
Prosperi Luciano
Prosperi Mattia CF
Solmone Maria
Capobianchi Maria
Ulivi Giovanni
author_sort Vincenti Donatella
title Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_short Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_full Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_fullStr Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_full_unstemmed Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_sort combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2011-01-01
description <p>Abstract</p> <p>Background</p> <p>Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called <it>in-silico </it>recombinants.</p> <p>Results</p> <p>The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain <it>in-silico </it>recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.</p> <p>Conclusions</p> <p>The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.</p>
url http://www.biomedcentral.com/1471-2105/12/5
work_keys_str_mv AT vincentidonatella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT rozeragabriella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT abbateisabella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT brusellesalessandro combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT prosperiluciano combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT prosperimattiacf combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT solmonemaria combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT capobianchimaria combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT ulivigiovanni combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
_version_ 1725852075144773632