Large-Scale Discovery of Gene-Enriched SNPs

Whole-genome association studies of complex traits in higher eukaryotes require a high density of single nucleotide polymorphism (SNP) markers at genome-wide coverage. To design high-throughput, multiplexed SNP genotyping assays, researchers must first discover large numbers of SNPs by extensively r...

Full description

Bibliographic Details
Main Authors: Michael A. Gore, Mark H. Wright, Elhan S. Ersoz, Pascal Bouffard, Edward S. Szekeres, Thomas P. Jarvie, Bonnie L. Hurwitz, Apurva Narechania, Timothy T. Harkins, George S. Grills, Doreen H. Ware, Edward S. Buckler
Format: Article
Language:English
Published: Wiley 2009-07-01
Series:The Plant Genome
Online Access:https://dl.sciencesocieties.org/publications/tpg/articles/2/2/121
id doaj-e55e47fbcf054076a09a586d81f073f3
record_format Article
spelling doaj-e55e47fbcf054076a09a586d81f073f32020-11-25T03:48:04ZengWileyThe Plant Genome1940-33722009-07-012212113310.3835/plantgenome2009.01.0002121Large-Scale Discovery of Gene-Enriched SNPsMichael A. GoreMark H. WrightElhan S. ErsozPascal BouffardEdward S. SzekeresThomas P. JarvieBonnie L. HurwitzApurva NarechaniaTimothy T. HarkinsGeorge S. GrillsDoreen H. WareEdward S. BucklerWhole-genome association studies of complex traits in higher eukaryotes require a high density of single nucleotide polymorphism (SNP) markers at genome-wide coverage. To design high-throughput, multiplexed SNP genotyping assays, researchers must first discover large numbers of SNPs by extensively resequencing multiple individuals or lines. For SNP discovery approaches using short read-lengths that next-generation DNA sequencing technologies offer, the highly repetitive and duplicated nature of large plant genomes presents additional challenges. Here, we describe a genomic library construction procedure that facilitates pyrosequencing of genic and low-copy regions in plant genomes, and a customized computational pipeline to analyze and assemble short reads (100–200 bp), identify allelic reference sequence comparisons, and call SNPs with a high degree of accuracy. With maize ( L.) as the test organism in a pilot experiment, the implementation of these methods resulted in the identification of 126,683 putative SNPs between two maize inbred lines at an estimated false discovery rate (FDR) of 15.1%. We estimated rates of false SNP discovery using an internal control, and we validated these FDR rates with an external SNP dataset that was generated using locus-specific PCR amplification and Sanger sequencing. These results show that this approach has wide applicability for efficiently and accurately detecting gene-enriched SNPs in large, complex plant genomes.https://dl.sciencesocieties.org/publications/tpg/articles/2/2/121
collection DOAJ
language English
format Article
sources DOAJ
author Michael A. Gore
Mark H. Wright
Elhan S. Ersoz
Pascal Bouffard
Edward S. Szekeres
Thomas P. Jarvie
Bonnie L. Hurwitz
Apurva Narechania
Timothy T. Harkins
George S. Grills
Doreen H. Ware
Edward S. Buckler
spellingShingle Michael A. Gore
Mark H. Wright
Elhan S. Ersoz
Pascal Bouffard
Edward S. Szekeres
Thomas P. Jarvie
Bonnie L. Hurwitz
Apurva Narechania
Timothy T. Harkins
George S. Grills
Doreen H. Ware
Edward S. Buckler
Large-Scale Discovery of Gene-Enriched SNPs
The Plant Genome
author_facet Michael A. Gore
Mark H. Wright
Elhan S. Ersoz
Pascal Bouffard
Edward S. Szekeres
Thomas P. Jarvie
Bonnie L. Hurwitz
Apurva Narechania
Timothy T. Harkins
George S. Grills
Doreen H. Ware
Edward S. Buckler
author_sort Michael A. Gore
title Large-Scale Discovery of Gene-Enriched SNPs
title_short Large-Scale Discovery of Gene-Enriched SNPs
title_full Large-Scale Discovery of Gene-Enriched SNPs
title_fullStr Large-Scale Discovery of Gene-Enriched SNPs
title_full_unstemmed Large-Scale Discovery of Gene-Enriched SNPs
title_sort large-scale discovery of gene-enriched snps
publisher Wiley
series The Plant Genome
issn 1940-3372
publishDate 2009-07-01
description Whole-genome association studies of complex traits in higher eukaryotes require a high density of single nucleotide polymorphism (SNP) markers at genome-wide coverage. To design high-throughput, multiplexed SNP genotyping assays, researchers must first discover large numbers of SNPs by extensively resequencing multiple individuals or lines. For SNP discovery approaches using short read-lengths that next-generation DNA sequencing technologies offer, the highly repetitive and duplicated nature of large plant genomes presents additional challenges. Here, we describe a genomic library construction procedure that facilitates pyrosequencing of genic and low-copy regions in plant genomes, and a customized computational pipeline to analyze and assemble short reads (100–200 bp), identify allelic reference sequence comparisons, and call SNPs with a high degree of accuracy. With maize ( L.) as the test organism in a pilot experiment, the implementation of these methods resulted in the identification of 126,683 putative SNPs between two maize inbred lines at an estimated false discovery rate (FDR) of 15.1%. We estimated rates of false SNP discovery using an internal control, and we validated these FDR rates with an external SNP dataset that was generated using locus-specific PCR amplification and Sanger sequencing. These results show that this approach has wide applicability for efficiently and accurately detecting gene-enriched SNPs in large, complex plant genomes.
url https://dl.sciencesocieties.org/publications/tpg/articles/2/2/121
work_keys_str_mv AT michaelagore largescalediscoveryofgeneenrichedsnps
AT markhwright largescalediscoveryofgeneenrichedsnps
AT elhansersoz largescalediscoveryofgeneenrichedsnps
AT pascalbouffard largescalediscoveryofgeneenrichedsnps
AT edwardsszekeres largescalediscoveryofgeneenrichedsnps
AT thomaspjarvie largescalediscoveryofgeneenrichedsnps
AT bonnielhurwitz largescalediscoveryofgeneenrichedsnps
AT apurvanarechania largescalediscoveryofgeneenrichedsnps
AT timothytharkins largescalediscoveryofgeneenrichedsnps
AT georgesgrills largescalediscoveryofgeneenrichedsnps
AT doreenhware largescalediscoveryofgeneenrichedsnps
AT edwardsbuckler largescalediscoveryofgeneenrichedsnps
_version_ 1724500381182984192