Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs ca...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-03-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2020.00134/full |
id |
doaj-b592052947a44c67bf4ed2520e53ee71 |
---|---|
record_format |
Article |
spelling |
doaj-b592052947a44c67bf4ed2520e53ee712020-11-25T01:23:41ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-03-011110.3389/fgene.2020.00134487341Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage DisequilibriumSohyoung Won0Jong-Eun Park1Ju-Hwan Son2Seung-Hwan Lee3Byeong Ho Park4Mina Park5Won-Chul Park6Han-Ha Chai7Heebal Kim8Jungjae Lee9Dajeong Lim10Department of Agricultural Biotechnology and Research Institute of Population Genomics, Seoul National University, Seoul, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaDepartment of Animal Science and Biotechnology, Chungnam National University, Daejeon, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaDepartment of Agricultural Biotechnology and Research Institute of Population Genomics, Seoul National University, Seoul, South KoreaJung P&C Institute, Inc., Yongin-si, South KoreaNational Institute of Animal Science, RDA, Wanju, South KoreaGenomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction.https://www.frontiersin.org/article/10.3389/fgene.2020.00134/fullgenomic predictionhaplotypehierarchical clusteringlinkage disequilibriumbest linear unbiased predictionaccuracy |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sohyoung Won Jong-Eun Park Ju-Hwan Son Seung-Hwan Lee Byeong Ho Park Mina Park Won-Chul Park Han-Ha Chai Heebal Kim Jungjae Lee Dajeong Lim |
spellingShingle |
Sohyoung Won Jong-Eun Park Ju-Hwan Son Seung-Hwan Lee Byeong Ho Park Mina Park Won-Chul Park Han-Ha Chai Heebal Kim Jungjae Lee Dajeong Lim Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium Frontiers in Genetics genomic prediction haplotype hierarchical clustering linkage disequilibrium best linear unbiased prediction accuracy |
author_facet |
Sohyoung Won Jong-Eun Park Ju-Hwan Son Seung-Hwan Lee Byeong Ho Park Mina Park Won-Chul Park Han-Ha Chai Heebal Kim Jungjae Lee Dajeong Lim |
author_sort |
Sohyoung Won |
title |
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium |
title_short |
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium |
title_full |
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium |
title_fullStr |
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium |
title_full_unstemmed |
Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium |
title_sort |
genomic prediction accuracy using haplotypes defined by size and hierarchical clustering based on linkage disequilibrium |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Genetics |
issn |
1664-8021 |
publishDate |
2020-03-01 |
description |
Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction. |
topic |
genomic prediction haplotype hierarchical clustering linkage disequilibrium best linear unbiased prediction accuracy |
url |
https://www.frontiersin.org/article/10.3389/fgene.2020.00134/full |
work_keys_str_mv |
AT sohyoungwon genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT jongeunpark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT juhwanson genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT seunghwanlee genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT byeonghopark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT minapark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT wonchulpark genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT hanhachai genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT heebalkim genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT jungjaelee genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium AT dajeonglim genomicpredictionaccuracyusinghaplotypesdefinedbysizeandhierarchicalclusteringbasedonlinkagedisequilibrium |
_version_ |
1725120540387049472 |