Investigating the impact of reference assembly choice on genomic analyses in a cattle breed

Abstract Background Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exce...

Full description

Bibliographic Details
Main Authors: Audald Lloret-Villas, Meenu Bhati, Naveen Kumar Kadri, Ruedi Fries, Hubert Pausch
Format: Article
Language:English
Published: BMC 2021-05-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-021-07554-w
id doaj-98a85278c8034f1a94e944e69b11f5cc
record_format Article
spelling doaj-98a85278c8034f1a94e944e69b11f5cc2021-05-23T11:24:32ZengBMCBMC Genomics1471-21642021-05-0122111710.1186/s12864-021-07554-wInvestigating the impact of reference assembly choice on genomic analyses in a cattle breedAudald Lloret-Villas0Meenu Bhati1Naveen Kumar Kadri2Ruedi Fries3Hubert Pausch4Animal Genomics, ETH ZürichAnimal Genomics, ETH ZürichAnimal Genomics, ETH ZürichChair of Animal Breeding, TU MünchenAnimal Genomics, ETH ZürichAbstract Background Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1). Results Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes using a two-step imputation approach. The accuracy of imputation (Beagle R2) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. Conclusions The ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection that already reached fixation using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species.https://doi.org/10.1186/s12864-021-07554-wReference genome comparisonBovineAlignment qualitySequence variantsFunctional annotationSignatures of selection
collection DOAJ
language English
format Article
sources DOAJ
author Audald Lloret-Villas
Meenu Bhati
Naveen Kumar Kadri
Ruedi Fries
Hubert Pausch
spellingShingle Audald Lloret-Villas
Meenu Bhati
Naveen Kumar Kadri
Ruedi Fries
Hubert Pausch
Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
BMC Genomics
Reference genome comparison
Bovine
Alignment quality
Sequence variants
Functional annotation
Signatures of selection
author_facet Audald Lloret-Villas
Meenu Bhati
Naveen Kumar Kadri
Ruedi Fries
Hubert Pausch
author_sort Audald Lloret-Villas
title Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_short Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_full Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_fullStr Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_full_unstemmed Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_sort investigating the impact of reference assembly choice on genomic analyses in a cattle breed
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2021-05-01
description Abstract Background Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1). Results Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes using a two-step imputation approach. The accuracy of imputation (Beagle R2) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. Conclusions The ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection that already reached fixation using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species.
topic Reference genome comparison
Bovine
Alignment quality
Sequence variants
Functional annotation
Signatures of selection
url https://doi.org/10.1186/s12864-021-07554-w
work_keys_str_mv AT audaldlloretvillas investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT meenubhati investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT naveenkumarkadri investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT ruedifries investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT hubertpausch investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
_version_ 1721429855898173440