Removing batch effects for prediction problems with frozen surrogate variable analysis

Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in populat...

Full description

Bibliographic Details
Main Authors: Hilary S. Parker, Héctor Corrada Bravo, Jeffrey T. Leek
Format: Article
Language:English
Published: PeerJ Inc. 2014-09-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/561.pdf
id doaj-9349d45564594fd89f1b6c4ec07c2cf2
record_format Article
spelling doaj-9349d45564594fd89f1b6c4ec07c2cf22020-11-24T21:04:40ZengPeerJ Inc.PeerJ2167-83592014-09-012e56110.7717/peerj.561561Removing batch effects for prediction problems with frozen surrogate variable analysisHilary S. Parker0Héctor Corrada Bravo1Jeffrey T. Leek2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USACenter for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, USADepartment of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USABatch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.https://peerj.com/articles/561.pdfBatch effectsSurrogate variable analysisPredictionMachine learningDatabaseStatistics
collection DOAJ
language English
format Article
sources DOAJ
author Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
spellingShingle Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
Removing batch effects for prediction problems with frozen surrogate variable analysis
PeerJ
Batch effects
Surrogate variable analysis
Prediction
Machine learning
Database
Statistics
author_facet Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
author_sort Hilary S. Parker
title Removing batch effects for prediction problems with frozen surrogate variable analysis
title_short Removing batch effects for prediction problems with frozen surrogate variable analysis
title_full Removing batch effects for prediction problems with frozen surrogate variable analysis
title_fullStr Removing batch effects for prediction problems with frozen surrogate variable analysis
title_full_unstemmed Removing batch effects for prediction problems with frozen surrogate variable analysis
title_sort removing batch effects for prediction problems with frozen surrogate variable analysis
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2014-09-01
description Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
topic Batch effects
Surrogate variable analysis
Prediction
Machine learning
Database
Statistics
url https://peerj.com/articles/561.pdf
work_keys_str_mv AT hilarysparker removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis
AT hectorcorradabravo removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis
AT jeffreytleek removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis
_version_ 1716770272233652224