Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges

A genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing a...

Full description

Bibliographic Details
Main Authors: Zahra Mortezaei, Mahmood Tavallaei
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914821000769
id doaj-db659eda6533468cbfca9913e3df5076
record_format Article
spelling doaj-db659eda6533468cbfca9913e3df50762021-06-19T04:55:06ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100586Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challengesZahra Mortezaei0Mahmood Tavallaei1Corresponding author.; Human Genetic Research Center, Baqiyatallah University of Medical Sciences, Tehran, IranCorresponding author.; Human Genetic Research Center, Baqiyatallah University of Medical Sciences, Tehran, IranA genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing and GWAS methodologies by reviewing novel techniques and methodologies. Data pre-processing performed prior to a GWAS presents challenges in Hardy-Weinberg (H–W) estimation, genotyping and accounting for factors such as sample structure. Recent developments towards overcoming these challenges are presented: the likelihood ratio test for H–W estimation, sequencing for genotyping, and techniques for dealing with sample structure. Traditional statistical methods cannot provide a way to insightfully interpret the data generated from high-throughput techniques; therefore, novel directions in GWAS methodologies are reviewed using efficient statistical methods, which are flexible techniques for performing genetic association analysis when factors such as non-random sampling or population structure occur. Despite the development of these methods, genotyping costs and an increased capacity for large dataset analysis have motivated researchers to examine tissue-specific signals. This review discusses how prospective and retrospective association analyses can be used to consider binary traits, address non-random ascertainment, and increase the capacity for large dataset analysis. Importantly, for disease susceptibility, rare variants can represent a large portion of genetic markers, and this article reviews some association methods for rare variant detection. In conclusion, the recent developments in GWAS data preparation and methodologies reviewed in this article can overcome most current challenges in the field and will also address future challenges.http://www.sciencedirect.com/science/article/pii/S2352914821000769Genome-wide association study (GWAS)SequencingMachine learningRetrospective association analysisTissue signalsRare variants
collection DOAJ
language English
format Article
sources DOAJ
author Zahra Mortezaei
Mahmood Tavallaei
spellingShingle Zahra Mortezaei
Mahmood Tavallaei
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
Informatics in Medicine Unlocked
Genome-wide association study (GWAS)
Sequencing
Machine learning
Retrospective association analysis
Tissue signals
Rare variants
author_facet Zahra Mortezaei
Mahmood Tavallaei
author_sort Zahra Mortezaei
title Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
title_short Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
title_full Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
title_fullStr Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
title_full_unstemmed Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
title_sort novel directions in data pre-processing and genome-wide association study (gwas) methodologies to overcome ongoing challenges
publisher Elsevier
series Informatics in Medicine Unlocked
issn 2352-9148
publishDate 2021-01-01
description A genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing and GWAS methodologies by reviewing novel techniques and methodologies. Data pre-processing performed prior to a GWAS presents challenges in Hardy-Weinberg (H–W) estimation, genotyping and accounting for factors such as sample structure. Recent developments towards overcoming these challenges are presented: the likelihood ratio test for H–W estimation, sequencing for genotyping, and techniques for dealing with sample structure. Traditional statistical methods cannot provide a way to insightfully interpret the data generated from high-throughput techniques; therefore, novel directions in GWAS methodologies are reviewed using efficient statistical methods, which are flexible techniques for performing genetic association analysis when factors such as non-random sampling or population structure occur. Despite the development of these methods, genotyping costs and an increased capacity for large dataset analysis have motivated researchers to examine tissue-specific signals. This review discusses how prospective and retrospective association analyses can be used to consider binary traits, address non-random ascertainment, and increase the capacity for large dataset analysis. Importantly, for disease susceptibility, rare variants can represent a large portion of genetic markers, and this article reviews some association methods for rare variant detection. In conclusion, the recent developments in GWAS data preparation and methodologies reviewed in this article can overcome most current challenges in the field and will also address future challenges.
topic Genome-wide association study (GWAS)
Sequencing
Machine learning
Retrospective association analysis
Tissue signals
Rare variants
url http://www.sciencedirect.com/science/article/pii/S2352914821000769
work_keys_str_mv AT zahramortezaei noveldirectionsindatapreprocessingandgenomewideassociationstudygwasmethodologiestoovercomeongoingchallenges
AT mahmoodtavallaei noveldirectionsindatapreprocessingandgenomewideassociationstudygwasmethodologiestoovercomeongoingchallenges
_version_ 1721371801574965248