Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show neg...

Full description

Bibliographic Details
Main Authors: Attila Frigyesi, Mattias Höglund
Format: Article
Language:English
Published: SAGE Publishing 2008-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S606
id doaj-3de4e049f5824c96ba5e678ff6d9e6ff
record_format Article
spelling doaj-3de4e049f5824c96ba5e678ff6d9e6ff2020-11-25T02:48:07ZengSAGE PublishingCancer Informatics1176-93512008-01-01610.4137/CIN.S606Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor SubtypesAttila Frigyesi0Mattias Höglund1Centre for Mathematical Sciences, Mathematical Statistics, Lund University, SE-223 62 Lund, Sweden.Department of Clinical Genetics, Lund University Hospital, SE-221 85 Lund, Sweden.Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.https://doi.org/10.4137/CIN.S606
collection DOAJ
language English
format Article
sources DOAJ
author Attila Frigyesi
Mattias Höglund
spellingShingle Attila Frigyesi
Mattias Höglund
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
Cancer Informatics
author_facet Attila Frigyesi
Mattias Höglund
author_sort Attila Frigyesi
title Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
title_short Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
title_full Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
title_fullStr Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
title_full_unstemmed Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
title_sort non-negative matrix factorization for the analysis of complex gene expression data: identification of clinically relevant tumor subtypes
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2008-01-01
description Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.
url https://doi.org/10.4137/CIN.S606
work_keys_str_mv AT attilafrigyesi nonnegativematrixfactorizationfortheanalysisofcomplexgeneexpressiondataidentificationofclinicallyrelevanttumorsubtypes
AT mattiashoglund nonnegativematrixfactorizationfortheanalysisofcomplexgeneexpressiondataidentificationofclinicallyrelevanttumorsubtypes
_version_ 1724749868012929024