Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges
A genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing a...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352914821000769 |
id |
doaj-db659eda6533468cbfca9913e3df5076 |
---|---|
record_format |
Article |
spelling |
doaj-db659eda6533468cbfca9913e3df50762021-06-19T04:55:06ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100586Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challengesZahra Mortezaei0Mahmood Tavallaei1Corresponding author.; Human Genetic Research Center, Baqiyatallah University of Medical Sciences, Tehran, IranCorresponding author.; Human Genetic Research Center, Baqiyatallah University of Medical Sciences, Tehran, IranA genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing and GWAS methodologies by reviewing novel techniques and methodologies. Data pre-processing performed prior to a GWAS presents challenges in Hardy-Weinberg (H–W) estimation, genotyping and accounting for factors such as sample structure. Recent developments towards overcoming these challenges are presented: the likelihood ratio test for H–W estimation, sequencing for genotyping, and techniques for dealing with sample structure. Traditional statistical methods cannot provide a way to insightfully interpret the data generated from high-throughput techniques; therefore, novel directions in GWAS methodologies are reviewed using efficient statistical methods, which are flexible techniques for performing genetic association analysis when factors such as non-random sampling or population structure occur. Despite the development of these methods, genotyping costs and an increased capacity for large dataset analysis have motivated researchers to examine tissue-specific signals. This review discusses how prospective and retrospective association analyses can be used to consider binary traits, address non-random ascertainment, and increase the capacity for large dataset analysis. Importantly, for disease susceptibility, rare variants can represent a large portion of genetic markers, and this article reviews some association methods for rare variant detection. In conclusion, the recent developments in GWAS data preparation and methodologies reviewed in this article can overcome most current challenges in the field and will also address future challenges.http://www.sciencedirect.com/science/article/pii/S2352914821000769Genome-wide association study (GWAS)SequencingMachine learningRetrospective association analysisTissue signalsRare variants |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zahra Mortezaei Mahmood Tavallaei |
spellingShingle |
Zahra Mortezaei Mahmood Tavallaei Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges Informatics in Medicine Unlocked Genome-wide association study (GWAS) Sequencing Machine learning Retrospective association analysis Tissue signals Rare variants |
author_facet |
Zahra Mortezaei Mahmood Tavallaei |
author_sort |
Zahra Mortezaei |
title |
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges |
title_short |
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges |
title_full |
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges |
title_fullStr |
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges |
title_full_unstemmed |
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges |
title_sort |
novel directions in data pre-processing and genome-wide association study (gwas) methodologies to overcome ongoing challenges |
publisher |
Elsevier |
series |
Informatics in Medicine Unlocked |
issn |
2352-9148 |
publishDate |
2021-01-01 |
description |
A genome-wide association study (GWAS) is a standard population-based technique for identifying the heritable genetic basis of complex diseases by discovering correlations between trait variations and allele frequencies of genetic markers. This article aims to help fill gaps in data pre-processing and GWAS methodologies by reviewing novel techniques and methodologies. Data pre-processing performed prior to a GWAS presents challenges in Hardy-Weinberg (H–W) estimation, genotyping and accounting for factors such as sample structure. Recent developments towards overcoming these challenges are presented: the likelihood ratio test for H–W estimation, sequencing for genotyping, and techniques for dealing with sample structure. Traditional statistical methods cannot provide a way to insightfully interpret the data generated from high-throughput techniques; therefore, novel directions in GWAS methodologies are reviewed using efficient statistical methods, which are flexible techniques for performing genetic association analysis when factors such as non-random sampling or population structure occur. Despite the development of these methods, genotyping costs and an increased capacity for large dataset analysis have motivated researchers to examine tissue-specific signals. This review discusses how prospective and retrospective association analyses can be used to consider binary traits, address non-random ascertainment, and increase the capacity for large dataset analysis. Importantly, for disease susceptibility, rare variants can represent a large portion of genetic markers, and this article reviews some association methods for rare variant detection. In conclusion, the recent developments in GWAS data preparation and methodologies reviewed in this article can overcome most current challenges in the field and will also address future challenges. |
topic |
Genome-wide association study (GWAS) Sequencing Machine learning Retrospective association analysis Tissue signals Rare variants |
url |
http://www.sciencedirect.com/science/article/pii/S2352914821000769 |
work_keys_str_mv |
AT zahramortezaei noveldirectionsindatapreprocessingandgenomewideassociationstudygwasmethodologiestoovercomeongoingchallenges AT mahmoodtavallaei noveldirectionsindatapreprocessingandgenomewideassociationstudygwasmethodologiestoovercomeongoingchallenges |
_version_ |
1721371801574965248 |