Improved detection of artifactual viral minority variants in high-throughput sequencing data

High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hi...

Full description

Bibliographic Details
Main Authors: Matthijs Rudolf Albert Welkers, Marcel eJonges, Rienk eJeeninga, Marion P.G. Koopmans, Menno ede Jong
Format: Article
Language:English
Published: Frontiers Media S.A. 2015-01-01
Series:Frontiers in Microbiology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/full
id doaj-5f8d809afeff43fdb10aa812553ccdaa
record_format Article
spelling doaj-5f8d809afeff43fdb10aa812553ccdaa2020-11-25T00:16:00ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2015-01-01510.3389/fmicb.2014.00804121188Improved detection of artifactual viral minority variants in high-throughput sequencing dataMatthijs Rudolf Albert Welkers0Marcel eJonges1Marcel eJonges2Rienk eJeeninga3Marion P.G. Koopmans4Marion P.G. Koopmans5Menno ede Jong6Academic Medical CenterNational Institute for Public Health and the EnvironmentErasmus Medical CenterAcademic Medical CenterNational Institute for Public Health and the EnvironmentErasmus Medical CenterAcademic Medical CenterHigh-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC), within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/fullInfluenza Virushigh-throughput sequencingerror correctionminority variantsIllumina Hiseq2000
collection DOAJ
language English
format Article
sources DOAJ
author Matthijs Rudolf Albert Welkers
Marcel eJonges
Marcel eJonges
Rienk eJeeninga
Marion P.G. Koopmans
Marion P.G. Koopmans
Menno ede Jong
spellingShingle Matthijs Rudolf Albert Welkers
Marcel eJonges
Marcel eJonges
Rienk eJeeninga
Marion P.G. Koopmans
Marion P.G. Koopmans
Menno ede Jong
Improved detection of artifactual viral minority variants in high-throughput sequencing data
Frontiers in Microbiology
Influenza Virus
high-throughput sequencing
error correction
minority variants
Illumina Hiseq2000
author_facet Matthijs Rudolf Albert Welkers
Marcel eJonges
Marcel eJonges
Rienk eJeeninga
Marion P.G. Koopmans
Marion P.G. Koopmans
Menno ede Jong
author_sort Matthijs Rudolf Albert Welkers
title Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_short Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_full Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_fullStr Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_full_unstemmed Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_sort improved detection of artifactual viral minority variants in high-throughput sequencing data
publisher Frontiers Media S.A.
series Frontiers in Microbiology
issn 1664-302X
publishDate 2015-01-01
description High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC), within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).
topic Influenza Virus
high-throughput sequencing
error correction
minority variants
Illumina Hiseq2000
url http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/full
work_keys_str_mv AT matthijsrudolfalbertwelkers improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT marcelejonges improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT marcelejonges improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT rienkejeeninga improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT marionpgkoopmans improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT marionpgkoopmans improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT mennoedejong improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
_version_ 1725385337570590720