Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis

Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus...

Full description

Bibliographic Details
Main Authors: Sungyeon Hong, Yongkang Kim, Taesung Park
Format: Article
Language:English
Published: SAGE Publishing 2014-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S16350
id doaj-1aa38939ddb9457fb20793bca3921b47
record_format Article
spelling doaj-1aa38939ddb9457fb20793bca3921b472020-11-25T03:09:35ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s710.4137/CIN.S16350Practical Issues in Screening and Variable Selection in Genome-Wide Association AnalysisSungyeon Hong0Yongkang Kim1Taesung Park2Department of Statistics, Seoul National University, Seoul, South Korea.Department of Statistics, Seoul National University, Seoul, South Korea.Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P -values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.https://doi.org/10.4137/CIN.S16350
collection DOAJ
language English
format Article
sources DOAJ
author Sungyeon Hong
Yongkang Kim
Taesung Park
spellingShingle Sungyeon Hong
Yongkang Kim
Taesung Park
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
Cancer Informatics
author_facet Sungyeon Hong
Yongkang Kim
Taesung Park
author_sort Sungyeon Hong
title Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
title_short Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
title_full Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
title_fullStr Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
title_full_unstemmed Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
title_sort practical issues in screening and variable selection in genome-wide association analysis
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2014-01-01
description Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P -values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.
url https://doi.org/10.4137/CIN.S16350
work_keys_str_mv AT sungyeonhong practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis
AT yongkangkim practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis
AT taesungpark practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis
_version_ 1724661741959249920