New methods to analyse microarray data that partially lack a reference signal

<p>Abstract</p> <p>Background</p> <p>Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intens...

Full description

Bibliographic Details
Main Authors: Bonten Marc JM, Lindsay Jodi A, Fluit Ad C, Carpaij Neeltje, Willems Rob JL
Format: Article
Language:English
Published: BMC 2009-11-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/10/522
id doaj-be5c2a4d1cda4eb1a106f876b0469b4e
record_format Article
spelling doaj-be5c2a4d1cda4eb1a106f876b0469b4e2020-11-24T20:47:06ZengBMCBMC Genomics1471-21642009-11-0110152210.1186/1471-2164-10-522New methods to analyse microarray data that partially lack a reference signalBonten Marc JMLindsay Jodi AFluit Ad CCarpaij NeeltjeWillems Rob JL<p>Abstract</p> <p>Background</p> <p>Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intensities to reliably determine presence or divergence each DNA fragment. However, the production of a good reference becomes unfeasible when microarrays are based on pan-genomes.</p> <p>When only a single strain is used as a reference for a multistrain array, the accessory gene pool will be partially represented by reference DNA, although these genes represent the genomic repertoire that can explain differences in virulence, pathogenicity or transmissibility between strains. The lack of a reference makes interpretation of the data for these genes difficult and, if the test signal is low, they are often deleted from the analysis. We aimed to develop novel methods to determine the presence or divergence of genes in a <it>Staphylococcus aureus </it>multistrain PCR product microarray-based CGH approach for which reference DNA was not available for some probes.</p> <p>Results</p> <p>In this study we have developed 6 new methods to predict divergence and presence of all genes spotted on a multistrain <it>Staphylococcus aureus </it>DNA microarray, published previously, including those gene spots that lack reference signals. When considering specificity and PPV (i.e. the false-positive rate) as the most important criteria for evaluating these methods, the method that defined gene presence based on a signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics. For this method specificity was 100% and 82% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively, and PPV were 100% and 76% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively.</p> <p>Conclusion</p> <p>A definition of gene presence based on signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics, allowing the analysis of 6-17% more of the genes not present in the reference strain. This method is recommended to analyse microarray data that partially lack a reference signal.</p> http://www.biomedcentral.com/1471-2164/10/522
collection DOAJ
language English
format Article
sources DOAJ
author Bonten Marc JM
Lindsay Jodi A
Fluit Ad C
Carpaij Neeltje
Willems Rob JL
spellingShingle Bonten Marc JM
Lindsay Jodi A
Fluit Ad C
Carpaij Neeltje
Willems Rob JL
New methods to analyse microarray data that partially lack a reference signal
BMC Genomics
author_facet Bonten Marc JM
Lindsay Jodi A
Fluit Ad C
Carpaij Neeltje
Willems Rob JL
author_sort Bonten Marc JM
title New methods to analyse microarray data that partially lack a reference signal
title_short New methods to analyse microarray data that partially lack a reference signal
title_full New methods to analyse microarray data that partially lack a reference signal
title_fullStr New methods to analyse microarray data that partially lack a reference signal
title_full_unstemmed New methods to analyse microarray data that partially lack a reference signal
title_sort new methods to analyse microarray data that partially lack a reference signal
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2009-11-01
description <p>Abstract</p> <p>Background</p> <p>Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intensities to reliably determine presence or divergence each DNA fragment. However, the production of a good reference becomes unfeasible when microarrays are based on pan-genomes.</p> <p>When only a single strain is used as a reference for a multistrain array, the accessory gene pool will be partially represented by reference DNA, although these genes represent the genomic repertoire that can explain differences in virulence, pathogenicity or transmissibility between strains. The lack of a reference makes interpretation of the data for these genes difficult and, if the test signal is low, they are often deleted from the analysis. We aimed to develop novel methods to determine the presence or divergence of genes in a <it>Staphylococcus aureus </it>multistrain PCR product microarray-based CGH approach for which reference DNA was not available for some probes.</p> <p>Results</p> <p>In this study we have developed 6 new methods to predict divergence and presence of all genes spotted on a multistrain <it>Staphylococcus aureus </it>DNA microarray, published previously, including those gene spots that lack reference signals. When considering specificity and PPV (i.e. the false-positive rate) as the most important criteria for evaluating these methods, the method that defined gene presence based on a signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics. For this method specificity was 100% and 82% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively, and PPV were 100% and 76% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively.</p> <p>Conclusion</p> <p>A definition of gene presence based on signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics, allowing the analysis of 6-17% more of the genes not present in the reference strain. This method is recommended to analyse microarray data that partially lack a reference signal.</p>
url http://www.biomedcentral.com/1471-2164/10/522
work_keys_str_mv AT bontenmarcjm newmethodstoanalysemicroarraydatathatpartiallylackareferencesignal
AT lindsayjodia newmethodstoanalysemicroarraydatathatpartiallylackareferencesignal
AT fluitadc newmethodstoanalysemicroarraydatathatpartiallylackareferencesignal
AT carpaijneeltje newmethodstoanalysemicroarraydatathatpartiallylackareferencesignal
AT willemsrobjl newmethodstoanalysemicroarraydatathatpartiallylackareferencesignal
_version_ 1716811177762226176