Biclustering of gene expression data by non-smooth non-negative matrix factorization

Abstract Background The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. On...

Full description

Bibliographic Details
Main Authors:	Carazo Jose M, Tirado F, Pascual-Marqui Roberto D, Carmona-Saez Pedro, Pascual-Montano Alberto
Format:	Article
Language:	English
Published:	BMC 2006-02-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/7/78

id	doaj-8297f131ab88499ba73bee3f4d5ebc91
record_format	Article
spelling	doaj-8297f131ab88499ba73bee3f4d5ebc912020-11-25T01:39:12ZengBMCBMC Bioinformatics1471-21052006-02-01717810.1186/1471-2105-7-78Biclustering of gene expression data by non-smooth non-negative matrix factorizationCarazo Jose MTirado FPascual-Marqui Roberto DCarmona-Saez PedroPascual-Montano Alberto<p>Abstract</p> <p>Background</p> <p>The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.</p> <p>Results</p> <p>In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (<it>n</it>sNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions.</p> <p>Conclusion</p> <p>The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.</p> http://www.biomedcentral.com/1471-2105/7/78
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Carazo Jose M Tirado F Pascual-Marqui Roberto D Carmona-Saez Pedro Pascual-Montano Alberto
spellingShingle	Carazo Jose M Tirado F Pascual-Marqui Roberto D Carmona-Saez Pedro Pascual-Montano Alberto Biclustering of gene expression data by non-smooth non-negative matrix factorization BMC Bioinformatics
author_facet	Carazo Jose M Tirado F Pascual-Marqui Roberto D Carmona-Saez Pedro Pascual-Montano Alberto
author_sort	Carazo Jose M
title	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_short	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_full	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_fullStr	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_full_unstemmed	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_sort	biclustering of gene expression data by non-smooth non-negative matrix factorization
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2006-02-01
description	<p>Abstract</p> <p>Background</p> <p>The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.</p> <p>Results</p> <p>In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (<it>n</it>sNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions.</p> <p>Conclusion</p> <p>The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.</p>
url	http://www.biomedcentral.com/1471-2105/7/78
work_keys_str_mv	AT carazojosem biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT tiradof biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT pascualmarquirobertod biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT carmonasaezpedro biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT pascualmontanoalberto biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization
_version_	1725049881372917760

Biclustering of gene expression data by non-smooth non-negative matrix factorization

Similar Items