SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification

A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels prov...

Full description

Bibliographic Details
Main Authors: Matthew C. McClure, John McCarthy, Paul Flynn, Jennifer C. McClure, Emma Dair, D. K. O'Connell, John F. Kearney
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-03-01
Series:Frontiers in Genetics
Subjects:
SNP
Online Access:http://journal.frontiersin.org/article/10.3389/fgene.2018.00084/full
id doaj-f467f917eed64c36851125f8b43256ef
record_format Article
spelling doaj-f467f917eed64c36851125f8b43256ef2020-11-25T00:57:32ZengFrontiers Media S.A.Frontiers in Genetics1664-80212018-03-01910.3389/fgene.2018.00084310674SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and IdentificationMatthew C. McClure0John McCarthy1Paul Flynn2Jennifer C. McClure3Emma Dair4D. K. O'Connell5John F. Kearney6Irish Cattle Breeding Federation, Cork, IrelandIrish Cattle Breeding Federation, Cork, IrelandWeatherbys Ireland, Kildare, IrelandIrish Cattle Breeding Federation, Cork, IrelandIrish Cattle Breeding Federation, Cork, IrelandIrish Cattle Breeding Federation, Cork, IrelandIrish Cattle Breeding Federation, Cork, IrelandA major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.http://journal.frontiersin.org/article/10.3389/fgene.2018.00084/fullSNPquality controlparentageparentage predictionISAG200
collection DOAJ
language English
format Article
sources DOAJ
author Matthew C. McClure
John McCarthy
Paul Flynn
Jennifer C. McClure
Emma Dair
D. K. O'Connell
John F. Kearney
spellingShingle Matthew C. McClure
John McCarthy
Paul Flynn
Jennifer C. McClure
Emma Dair
D. K. O'Connell
John F. Kearney
SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
Frontiers in Genetics
SNP
quality control
parentage
parentage prediction
ISAG200
author_facet Matthew C. McClure
John McCarthy
Paul Flynn
Jennifer C. McClure
Emma Dair
D. K. O'Connell
John F. Kearney
author_sort Matthew C. McClure
title SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
title_short SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
title_full SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
title_fullStr SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
title_full_unstemmed SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
title_sort snp data quality control in a national beef and dairy cattle system and highly accurate snp based parentage verification and identification
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2018-03-01
description A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.
topic SNP
quality control
parentage
parentage prediction
ISAG200
url http://journal.frontiersin.org/article/10.3389/fgene.2018.00084/full
work_keys_str_mv AT matthewcmcclure snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT johnmccarthy snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT paulflynn snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT jennifercmcclure snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT emmadair snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT dkoconnell snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
AT johnfkearney snpdataqualitycontrolinanationalbeefanddairycattlesystemandhighlyaccuratesnpbasedparentageverificationandidentification
_version_ 1725223718264766464