Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium

Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs ca...

Full description

Bibliographic Details
Main Authors: Sohyoung Won, Jong-Eun Park, Ju-Hwan Son, Seung-Hwan Lee, Byeong Ho Park, Mina Park, Won-Chul Park, Han-Ha Chai, Heebal Kim, Jungjae Lee, Dajeong Lim
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-03-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.00134/full
id doaj-b592052947a44c67bf4ed2520e53ee71
record_format Article
spelling doaj-b592052947a44c67bf4ed2520e53ee712020-11-25T01:23:41ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-03-011110.3389/fgene.2020.00134487341Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage DisequilibriumSohyoung Won0Jong-Eun Park1Ju-Hwan Son2Seung-Hwan Lee3Byeong Ho Park4Mina Park5Won-Chul Park6Han-Ha Chai7Heebal Kim8Jungjae Lee9Dajeong Lim10Department of Agricultural Biotechnology and Research Institute of Population Genomics, Seoul National University, Seoul, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaDepartment of Animal Science and Biotechnology, Chungnam National University, Daejeon, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaDepartment of Agricultural Biotechnology and Research Institute of Population Genomics, Seoul National University, Seoul, South KoreaJung P&C Institute, Inc., Yongin-si, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaGenomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction.https://www.frontiersin.org/article/10.3389/fgene.2020.00134/fullgenomic predictionhaplotypehierarchical clusteringlinkage disequilibriumbest linear unbiased predictionaccuracy
collection DOAJ
language English
format Article
sources DOAJ
author Sohyoung Won
Jong-Eun Park
Ju-Hwan Son
Seung-Hwan Lee
Byeong Ho Park
Mina Park
Won-Chul Park
Han-Ha Chai
Heebal Kim
Jungjae Lee
Dajeong Lim
spellingShingle Sohyoung Won
Jong-Eun Park
Ju-Hwan Son
Seung-Hwan Lee
Byeong Ho Park
Mina Park
Won-Chul Park
Han-Ha Chai
Heebal Kim
Jungjae Lee
Dajeong Lim
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
Frontiers in Genetics
genomic prediction
haplotype
hierarchical clustering
linkage disequilibrium
best linear unbiased prediction
accuracy
author_facet Sohyoung Won
Jong-Eun Park
Ju-Hwan Son
Seung-Hwan Lee
Byeong Ho Park
Mina Park
Won-Chul Park
Han-Ha Chai
Heebal Kim
Jungjae Lee
Dajeong Lim
author_sort Sohyoung Won
title Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
title_short Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
title_full Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
title_fullStr Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
title_full_unstemmed Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
title_sort genomic prediction accuracy using haplotypes defined by size and hierarchical clustering based on linkage disequilibrium
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-03-01
description Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction.
topic genomic prediction
haplotype
hierarchical clustering
linkage disequilibrium
best linear unbiased prediction
accuracy
url https://www.frontiersin.org/article/10.3389/fgene.2020.00134/full
work_keys_str_mv AT sohyoungwon genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT jongeunpark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT juhwanson genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT seunghwanlee genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT byeonghopark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT minapark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT wonchulpark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT hanhachai genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT heebalkim genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT jungjaelee genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
AT dajeonglim genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium
_version_ 1725120540387049472