Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.

Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is st...

Full description

Bibliographic Details
Main Authors: Stefanie Hieke, Axel Benner, Richard F Schlenk, Martin Schumacher, Lars Bullinger, Harald Binder
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4861340?pdf=render
id doaj-888780320d2a42caa8b2af5373c205c2
record_format Article
spelling doaj-888780320d2a42caa8b2af5373c205c22020-11-24T21:50:24ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01115e015522610.1371/journal.pone.0155226Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.Stefanie HiekeAxel BennerRichard F SchlenkMartin SchumacherLars BullingerHarald BinderClinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses.http://europepmc.org/articles/PMC4861340?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Stefanie Hieke
Axel Benner
Richard F Schlenk
Martin Schumacher
Lars Bullinger
Harald Binder
spellingShingle Stefanie Hieke
Axel Benner
Richard F Schlenk
Martin Schumacher
Lars Bullinger
Harald Binder
Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
PLoS ONE
author_facet Stefanie Hieke
Axel Benner
Richard F Schlenk
Martin Schumacher
Lars Bullinger
Harald Binder
author_sort Stefanie Hieke
title Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
title_short Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
title_full Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
title_fullStr Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
title_full_unstemmed Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.
title_sort identifying prognostic snps in clinical cohorts: complementing univariate analyses by resampling and multivariable modeling.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses.
url http://europepmc.org/articles/PMC4861340?pdf=render
work_keys_str_mv AT stefaniehieke identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT axelbenner identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT richardfschlenk identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT martinschumacher identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT larsbullinger identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT haraldbinder identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
_version_ 1725884295645495296