Removing batch effects for prediction problems with frozen surrogate variable analysis
Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in populat...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2014-09-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/561.pdf |
id |
doaj-9349d45564594fd89f1b6c4ec07c2cf2 |
---|---|
record_format |
Article |
spelling |
doaj-9349d45564594fd89f1b6c4ec07c2cf22020-11-24T21:04:40ZengPeerJ Inc.PeerJ2167-83592014-09-012e56110.7717/peerj.561561Removing batch effects for prediction problems with frozen surrogate variable analysisHilary S. Parker0Héctor Corrada Bravo1Jeffrey T. Leek2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USACenter for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, USADepartment of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USABatch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.https://peerj.com/articles/561.pdfBatch effectsSurrogate variable analysisPredictionMachine learningDatabaseStatistics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hilary S. Parker Héctor Corrada Bravo Jeffrey T. Leek |
spellingShingle |
Hilary S. Parker Héctor Corrada Bravo Jeffrey T. Leek Removing batch effects for prediction problems with frozen surrogate variable analysis PeerJ Batch effects Surrogate variable analysis Prediction Machine learning Database Statistics |
author_facet |
Hilary S. Parker Héctor Corrada Bravo Jeffrey T. Leek |
author_sort |
Hilary S. Parker |
title |
Removing batch effects for prediction problems with frozen surrogate variable analysis |
title_short |
Removing batch effects for prediction problems with frozen surrogate variable analysis |
title_full |
Removing batch effects for prediction problems with frozen surrogate variable analysis |
title_fullStr |
Removing batch effects for prediction problems with frozen surrogate variable analysis |
title_full_unstemmed |
Removing batch effects for prediction problems with frozen surrogate variable analysis |
title_sort |
removing batch effects for prediction problems with frozen surrogate variable analysis |
publisher |
PeerJ Inc. |
series |
PeerJ |
issn |
2167-8359 |
publishDate |
2014-09-01 |
description |
Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package. |
topic |
Batch effects Surrogate variable analysis Prediction Machine learning Database Statistics |
url |
https://peerj.com/articles/561.pdf |
work_keys_str_mv |
AT hilarysparker removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis AT hectorcorradabravo removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis AT jeffreytleek removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis |
_version_ |
1716770272233652224 |