mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage d...

Full description

Bibliographic Details
Main Authors: Bing Song, August E. Woerner, John Planz
Format: Article
Language:English
Published: BMC 2021-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03945-0
id doaj-3138c1d4996d41bea7fb883e5adf2e90
record_format Article
spelling doaj-3138c1d4996d41bea7fb883e5adf2e902021-01-10T13:03:15ZengBMCBMC Bioinformatics1471-21052021-01-0122112110.1186/s12859-020-03945-0mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypesBing Song0August E. Woerner1John Planz2Department of Microbiology, Immunology and Genetics, University of North Texas Health Science CenterDepartment of Microbiology, Immunology and Genetics, University of North Texas Health Science CenterDepartment of Microbiology, Immunology and Genetics, University of North Texas Health Science CenterAbstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html .https://doi.org/10.1186/s12859-020-03945-0Linkage disequilibriumR packageNon-random associationMutual independenceSTRsSNPs
collection DOAJ
language English
format Article
sources DOAJ
author Bing Song
August E. Woerner
John Planz
spellingShingle Bing Song
August E. Woerner
John Planz
mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
BMC Bioinformatics
Linkage disequilibrium
R package
Non-random association
Mutual independence
STRs
SNPs
author_facet Bing Song
August E. Woerner
John Planz
author_sort Bing Song
title mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
title_short mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
title_full mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
title_fullStr mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
title_full_unstemmed mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
title_sort mixindependr: a r package for statistical independence testing of loci in database of multi-locus genotypes
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-01-01
description Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html .
topic Linkage disequilibrium
R package
Non-random association
Mutual independence
STRs
SNPs
url https://doi.org/10.1186/s12859-020-03945-0
work_keys_str_mv AT bingsong mixindependrarpackageforstatisticalindependencetestingoflociindatabaseofmultilocusgenotypes
AT augustewoerner mixindependrarpackageforstatisticalindependencetestingoflociindatabaseofmultilocusgenotypes
AT johnplanz mixindependrarpackageforstatisticalindependencetestingoflociindatabaseofmultilocusgenotypes
_version_ 1724341846337912832