Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds

Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of g...

Full description

Bibliographic Details
Main Authors: F. Bertolini, G. Galimberti, G. Schiavo, S. Mastrangelo, R. Di Gerlando, M.G. Strillacci, A. Bagnato, B. Portolano, L. Fontanesi
Format: Article
Language:English
Published: Elsevier 2018-01-01
Series:Animal
Subjects:
SNP
Online Access:http://www.sciencedirect.com/science/article/pii/S1751731117001355
id doaj-4f2d9d2590784051ab6c40964e9a10f0
record_format Article
spelling doaj-4f2d9d2590784051ab6c40964e9a10f02021-06-06T04:53:37ZengElsevierAnimal1751-73112018-01-011211219Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breedsF. Bertolini0G. Galimberti1G. Schiavo2S. Mastrangelo3R. Di Gerlando4M.G. Strillacci5A. Bagnato6B. Portolano7L. Fontanesi8Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, ItalyDepartment of Statistical Sciences “Paolo Fortunati”, University of Bologna, Via delle Belle Arti 41, 40126 Bologna, ItalyDepartment of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, ItalyDepartment of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, ItalyDepartment of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, ItalyDepartment of Veterinary Medicine, Università degli Studi di Milano, Via Celoria 10, 20133 Milano, ItalyDepartment of Veterinary Medicine, Università degli Studi di Milano, Via Celoria 10, 20133 Milano, ItalyDepartment of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, ItalyDepartment of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, ItalyCommercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, Fst and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.http://www.sciencedirect.com/science/article/pii/S1751731117001355SNPbreed assignmentRandom ForestBos taurus
collection DOAJ
language English
format Article
sources DOAJ
author F. Bertolini
G. Galimberti
G. Schiavo
S. Mastrangelo
R. Di Gerlando
M.G. Strillacci
A. Bagnato
B. Portolano
L. Fontanesi
spellingShingle F. Bertolini
G. Galimberti
G. Schiavo
S. Mastrangelo
R. Di Gerlando
M.G. Strillacci
A. Bagnato
B. Portolano
L. Fontanesi
Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
Animal
SNP
breed assignment
Random Forest
Bos taurus
author_facet F. Bertolini
G. Galimberti
G. Schiavo
S. Mastrangelo
R. Di Gerlando
M.G. Strillacci
A. Bagnato
B. Portolano
L. Fontanesi
author_sort F. Bertolini
title Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
title_short Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
title_full Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
title_fullStr Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
title_full_unstemmed Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
title_sort preselection statistics and random forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
publisher Elsevier
series Animal
issn 1751-7311
publishDate 2018-01-01
description Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, Fst and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
topic SNP
breed assignment
Random Forest
Bos taurus
url http://www.sciencedirect.com/science/article/pii/S1751731117001355
work_keys_str_mv AT fbertolini preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT ggalimberti preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT gschiavo preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT smastrangelo preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT rdigerlando preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT mgstrillacci preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT abagnato preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT bportolano preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
AT lfontanesi preselectionstatisticsandrandomforestclassificationidentifypopulationinformativesinglenucleotidepolymorphismsincosmopolitanandautochthonouscattlebreeds
_version_ 1721394845946216448