Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis
Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2014-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S16350 |
id |
doaj-1aa38939ddb9457fb20793bca3921b47 |
---|---|
record_format |
Article |
spelling |
doaj-1aa38939ddb9457fb20793bca3921b472020-11-25T03:09:35ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s710.4137/CIN.S16350Practical Issues in Screening and Variable Selection in Genome-Wide Association AnalysisSungyeon Hong0Yongkang Kim1Taesung Park2Department of Statistics, Seoul National University, Seoul, South Korea.Department of Statistics, Seoul National University, Seoul, South Korea.Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P -values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.https://doi.org/10.4137/CIN.S16350 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sungyeon Hong Yongkang Kim Taesung Park |
spellingShingle |
Sungyeon Hong Yongkang Kim Taesung Park Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis Cancer Informatics |
author_facet |
Sungyeon Hong Yongkang Kim Taesung Park |
author_sort |
Sungyeon Hong |
title |
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis |
title_short |
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis |
title_full |
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis |
title_fullStr |
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis |
title_full_unstemmed |
Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis |
title_sort |
practical issues in screening and variable selection in genome-wide association analysis |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2014-01-01 |
description |
Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P -values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets. |
url |
https://doi.org/10.4137/CIN.S16350 |
work_keys_str_mv |
AT sungyeonhong practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis AT yongkangkim practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis AT taesungpark practicalissuesinscreeningandvariableselectioningenomewideassociationanalysis |
_version_ |
1724661741959249920 |