A method for detecting and correcting feature misidentification on expression microarrays

<p>Abstract</p> <p>Background</p> <p>Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). N...

Full description

Bibliographic Details
Main Authors: Brown Patrick O, Sikic Branimir I, Diehn Maximilian, Schaner Marci, Tu I-Ping, Botstein David, Fero Michael J
Format: Article
Language:English
Published: BMC 2004-09-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/5/64
id doaj-686ad43f4f32440badd3d713b5f4a7bb
record_format Article
spelling doaj-686ad43f4f32440badd3d713b5f4a7bb2020-11-25T02:42:25ZengBMCBMC Genomics1471-21642004-09-01516410.1186/1471-2164-5-64A method for detecting and correcting feature misidentification on expression microarraysBrown Patrick OSikic Branimir IDiehn MaximilianSchaner MarciTu I-PingBotstein DavidFero Michael J<p>Abstract</p> <p>Background</p> <p>Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Human array began to accumulate, a small but significant number of discrepancies were detected that required a serious attempt to track down the original source of error. Due to a controlled process environment, sufficient data was available to accurately track the entire process leading to up to the final expression data. In this paper, we describe our statistical methods to detect the inconsistencies in microarray data that arise from process errors, and discuss our technique to locate and fix these errors.</p> <p>Results</p> <p>To date, the Brown and Botstein laboratories and the Stanford Functional Genomics Facility have together produced 40,000 large-scale (10–50,000 feature) cDNA microarrays. By applying the heuristic described here, we have been able to check most of these arrays for misidentified features, and have been able to confidently apply fixes to the data where needed. Out of the 265 million features checked in our database, problems were detected and corrected on 1.3 million of them.</p> <p>Conclusion</p> <p>Process errors in any genome scale high throughput production regime can lead to subsequent errors in data analysis. We show the value of tracking multi-step high throughput operations by using this knowledge to detect and correct misidentified data on gene expression microarrays.</p> http://www.biomedcentral.com/1471-2164/5/64
collection DOAJ
language English
format Article
sources DOAJ
author Brown Patrick O
Sikic Branimir I
Diehn Maximilian
Schaner Marci
Tu I-Ping
Botstein David
Fero Michael J
spellingShingle Brown Patrick O
Sikic Branimir I
Diehn Maximilian
Schaner Marci
Tu I-Ping
Botstein David
Fero Michael J
A method for detecting and correcting feature misidentification on expression microarrays
BMC Genomics
author_facet Brown Patrick O
Sikic Branimir I
Diehn Maximilian
Schaner Marci
Tu I-Ping
Botstein David
Fero Michael J
author_sort Brown Patrick O
title A method for detecting and correcting feature misidentification on expression microarrays
title_short A method for detecting and correcting feature misidentification on expression microarrays
title_full A method for detecting and correcting feature misidentification on expression microarrays
title_fullStr A method for detecting and correcting feature misidentification on expression microarrays
title_full_unstemmed A method for detecting and correcting feature misidentification on expression microarrays
title_sort method for detecting and correcting feature misidentification on expression microarrays
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2004-09-01
description <p>Abstract</p> <p>Background</p> <p>Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Human array began to accumulate, a small but significant number of discrepancies were detected that required a serious attempt to track down the original source of error. Due to a controlled process environment, sufficient data was available to accurately track the entire process leading to up to the final expression data. In this paper, we describe our statistical methods to detect the inconsistencies in microarray data that arise from process errors, and discuss our technique to locate and fix these errors.</p> <p>Results</p> <p>To date, the Brown and Botstein laboratories and the Stanford Functional Genomics Facility have together produced 40,000 large-scale (10–50,000 feature) cDNA microarrays. By applying the heuristic described here, we have been able to check most of these arrays for misidentified features, and have been able to confidently apply fixes to the data where needed. Out of the 265 million features checked in our database, problems were detected and corrected on 1.3 million of them.</p> <p>Conclusion</p> <p>Process errors in any genome scale high throughput production regime can lead to subsequent errors in data analysis. We show the value of tracking multi-step high throughput operations by using this knowledge to detect and correct misidentified data on gene expression microarrays.</p>
url http://www.biomedcentral.com/1471-2164/5/64
work_keys_str_mv AT brownpatricko amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT sikicbranimiri amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT diehnmaximilian amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT schanermarci amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT tuiping amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT botsteindavid amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT feromichaelj amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT brownpatricko methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT sikicbranimiri methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT diehnmaximilian methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT schanermarci methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT tuiping methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT botsteindavid methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
AT feromichaelj methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays
_version_ 1724774148341760000