Discovering multi–level structures in bio-molecular data through the Bernstein inequality
<p>Abstract</p> <p>Background</p> <p>The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2008-03-01
|
Series: | BMC Bioinformatics |
id |
doaj-ec4d3b6b1bcb4cc3a9cc2e687ab4c4f0 |
---|---|
record_format |
Article |
spelling |
doaj-ec4d3b6b1bcb4cc3a9cc2e687ab4c4f02020-11-24T23:36:36ZengBMCBMC Bioinformatics1471-21052008-03-019Suppl 2S410.1186/1471-2105-9-S2-S4Discovering multi–level structures in bio-molecular data through the Bernstein inequalityValentini GiorgioBertoni Alberto<p>Abstract</p> <p>Background</p> <p>The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed.</p> <p>Results</p> <p>To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method.</p> <p>Conclusions</p> <p>The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data.</p> |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Valentini Giorgio Bertoni Alberto |
spellingShingle |
Valentini Giorgio Bertoni Alberto Discovering multi–level structures in bio-molecular data through the Bernstein inequality BMC Bioinformatics |
author_facet |
Valentini Giorgio Bertoni Alberto |
author_sort |
Valentini Giorgio |
title |
Discovering multi–level structures in bio-molecular data through the Bernstein inequality |
title_short |
Discovering multi–level structures in bio-molecular data through the Bernstein inequality |
title_full |
Discovering multi–level structures in bio-molecular data through the Bernstein inequality |
title_fullStr |
Discovering multi–level structures in bio-molecular data through the Bernstein inequality |
title_full_unstemmed |
Discovering multi–level structures in bio-molecular data through the Bernstein inequality |
title_sort |
discovering multi–level structures in bio-molecular data through the bernstein inequality |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2008-03-01 |
description |
<p>Abstract</p> <p>Background</p> <p>The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed.</p> <p>Results</p> <p>To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method.</p> <p>Conclusions</p> <p>The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data.</p> |
work_keys_str_mv |
AT valentinigiorgio discoveringmultilevelstructuresinbiomoleculardatathroughthebernsteininequality AT bertonialberto discoveringmultilevelstructuresinbiomoleculardatathroughthebernsteininequality |
_version_ |
1725522548034109440 |