VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.

Next-generation sequencing (NGS) is a powerful tool for massive detection of DNA sequence variants such as single nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs) and insertions/deletions (indels). For routine screening of numerous samples, these variants are often converted in...

Full description

Bibliographic Details
Main Authors: Wojciech Wesołowski, Beata Domnicz, Joanna Augustynowicz, Marek Szklarczyk
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-05-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008980
id doaj-ed122b5b0a774d89ac97187f8fddb54e
record_format Article
spelling doaj-ed122b5b0a774d89ac97187f8fddb54e2021-06-19T05:32:57ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-05-01175e100898010.1371/journal.pcbi.1008980VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.Wojciech WesołowskiBeata DomniczJoanna AugustynowiczMarek SzklarczykNext-generation sequencing (NGS) is a powerful tool for massive detection of DNA sequence variants such as single nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs) and insertions/deletions (indels). For routine screening of numerous samples, these variants are often converted into cleaved amplified polymorphic sequence (CAPS) markers which are based on the presence versus absence of restriction sites within PCR products. Current computational tools for SNP to CAPS conversion are limited and usually infeasible to use for large datasets as those generated with NGS. Moreover, there is no available tool for massive conversion of MNPs and indels into CAPS markers. Here, we present VCF2CAPS-a new software for identification of restriction endonucleases that recognize SNP/MNP/indel-containing sequences from NGS experiments. Additionally, the program contains filtration utilities not available in other SNP to CAPS converters-selection of markers with a single polymorphic cut site within a user-specified sequence length, and selection of markers that differentiate up to three user-defined groups of individuals from the analyzed population. Performance of VCF2CAPS was tested on a thoroughly analyzed dataset from a genotyping-by-sequencing (GBS) experiment. A selection of CAPS markers picked by the program was subjected to experimental verification. CAPS markers, also referred to as PCR-RFLPs, belong to basic tools exploited in plant, animal and human genetics. Our new software-VCF2CAPS-fills the gap in the current inventory of genetic software by high-throughput CAPS marker design from next-generation sequencing (NGS) data. The program should be of interest to geneticists involved in molecular diagnostics. In this paper we show a successful exemplary application of VCF2CAPS and we believe that its usefulness is guaranteed by the growing availability of NGS services.https://doi.org/10.1371/journal.pcbi.1008980
collection DOAJ
language English
format Article
sources DOAJ
author Wojciech Wesołowski
Beata Domnicz
Joanna Augustynowicz
Marek Szklarczyk
spellingShingle Wojciech Wesołowski
Beata Domnicz
Joanna Augustynowicz
Marek Szklarczyk
VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
PLoS Computational Biology
author_facet Wojciech Wesołowski
Beata Domnicz
Joanna Augustynowicz
Marek Szklarczyk
author_sort Wojciech Wesołowski
title VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
title_short VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
title_full VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
title_fullStr VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
title_full_unstemmed VCF2CAPS-A high-throughput CAPS marker design from VCF files and its test-use on a genotyping-by-sequencing (GBS) dataset.
title_sort vcf2caps-a high-throughput caps marker design from vcf files and its test-use on a genotyping-by-sequencing (gbs) dataset.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2021-05-01
description Next-generation sequencing (NGS) is a powerful tool for massive detection of DNA sequence variants such as single nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs) and insertions/deletions (indels). For routine screening of numerous samples, these variants are often converted into cleaved amplified polymorphic sequence (CAPS) markers which are based on the presence versus absence of restriction sites within PCR products. Current computational tools for SNP to CAPS conversion are limited and usually infeasible to use for large datasets as those generated with NGS. Moreover, there is no available tool for massive conversion of MNPs and indels into CAPS markers. Here, we present VCF2CAPS-a new software for identification of restriction endonucleases that recognize SNP/MNP/indel-containing sequences from NGS experiments. Additionally, the program contains filtration utilities not available in other SNP to CAPS converters-selection of markers with a single polymorphic cut site within a user-specified sequence length, and selection of markers that differentiate up to three user-defined groups of individuals from the analyzed population. Performance of VCF2CAPS was tested on a thoroughly analyzed dataset from a genotyping-by-sequencing (GBS) experiment. A selection of CAPS markers picked by the program was subjected to experimental verification. CAPS markers, also referred to as PCR-RFLPs, belong to basic tools exploited in plant, animal and human genetics. Our new software-VCF2CAPS-fills the gap in the current inventory of genetic software by high-throughput CAPS marker design from next-generation sequencing (NGS) data. The program should be of interest to geneticists involved in molecular diagnostics. In this paper we show a successful exemplary application of VCF2CAPS and we believe that its usefulness is guaranteed by the growing availability of NGS services.
url https://doi.org/10.1371/journal.pcbi.1008980
work_keys_str_mv AT wojciechwesołowski vcf2capsahighthroughputcapsmarkerdesignfromvcffilesanditstestuseonagenotypingbysequencinggbsdataset
AT beatadomnicz vcf2capsahighthroughputcapsmarkerdesignfromvcffilesanditstestuseonagenotypingbysequencinggbsdataset
AT joannaaugustynowicz vcf2capsahighthroughputcapsmarkerdesignfromvcffilesanditstestuseonagenotypingbysequencinggbsdataset
AT marekszklarczyk vcf2capsahighthroughputcapsmarkerdesignfromvcffilesanditstestuseonagenotypingbysequencinggbsdataset
_version_ 1721371310661042176