Improved detection of artifactual viral minority variants in high-throughput sequencing data
High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2015-01-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/full |
id |
doaj-5f8d809afeff43fdb10aa812553ccdaa |
---|---|
record_format |
Article |
spelling |
doaj-5f8d809afeff43fdb10aa812553ccdaa2020-11-25T00:16:00ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2015-01-01510.3389/fmicb.2014.00804121188Improved detection of artifactual viral minority variants in high-throughput sequencing dataMatthijs Rudolf Albert Welkers0Marcel eJonges1Marcel eJonges2Rienk eJeeninga3Marion P.G. Koopmans4Marion P.G. Koopmans5Menno ede Jong6Academic Medical CenterNational Institute for Public Health and the EnvironmentErasmus Medical CenterAcademic Medical CenterNational Institute for Public Health and the EnvironmentErasmus Medical CenterAcademic Medical CenterHigh-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC), within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/fullInfluenza Virushigh-throughput sequencingerror correctionminority variantsIllumina Hiseq2000 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Matthijs Rudolf Albert Welkers Marcel eJonges Marcel eJonges Rienk eJeeninga Marion P.G. Koopmans Marion P.G. Koopmans Menno ede Jong |
spellingShingle |
Matthijs Rudolf Albert Welkers Marcel eJonges Marcel eJonges Rienk eJeeninga Marion P.G. Koopmans Marion P.G. Koopmans Menno ede Jong Improved detection of artifactual viral minority variants in high-throughput sequencing data Frontiers in Microbiology Influenza Virus high-throughput sequencing error correction minority variants Illumina Hiseq2000 |
author_facet |
Matthijs Rudolf Albert Welkers Marcel eJonges Marcel eJonges Rienk eJeeninga Marion P.G. Koopmans Marion P.G. Koopmans Menno ede Jong |
author_sort |
Matthijs Rudolf Albert Welkers |
title |
Improved detection of artifactual viral minority variants in high-throughput sequencing data |
title_short |
Improved detection of artifactual viral minority variants in high-throughput sequencing data |
title_full |
Improved detection of artifactual viral minority variants in high-throughput sequencing data |
title_fullStr |
Improved detection of artifactual viral minority variants in high-throughput sequencing data |
title_full_unstemmed |
Improved detection of artifactual viral minority variants in high-throughput sequencing data |
title_sort |
improved detection of artifactual viral minority variants in high-throughput sequencing data |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Microbiology |
issn |
1664-302X |
publishDate |
2015-01-01 |
description |
High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC), within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). |
topic |
Influenza Virus high-throughput sequencing error correction minority variants Illumina Hiseq2000 |
url |
http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00804/full |
work_keys_str_mv |
AT matthijsrudolfalbertwelkers improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT marcelejonges improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT marcelejonges improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT rienkejeeninga improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT marionpgkoopmans improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT marionpgkoopmans improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata AT mennoedejong improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata |
_version_ |
1725385337570590720 |