Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes
Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show neg...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2008-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S606 |
id |
doaj-3de4e049f5824c96ba5e678ff6d9e6ff |
---|---|
record_format |
Article |
spelling |
doaj-3de4e049f5824c96ba5e678ff6d9e6ff2020-11-25T02:48:07ZengSAGE PublishingCancer Informatics1176-93512008-01-01610.4137/CIN.S606Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor SubtypesAttila Frigyesi0Mattias Höglund1Centre for Mathematical Sciences, Mathematical Statistics, Lund University, SE-223 62 Lund, Sweden.Department of Clinical Genetics, Lund University Hospital, SE-221 85 Lund, Sweden.Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.https://doi.org/10.4137/CIN.S606 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Attila Frigyesi Mattias Höglund |
spellingShingle |
Attila Frigyesi Mattias Höglund Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes Cancer Informatics |
author_facet |
Attila Frigyesi Mattias Höglund |
author_sort |
Attila Frigyesi |
title |
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes |
title_short |
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes |
title_full |
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes |
title_fullStr |
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes |
title_full_unstemmed |
Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes |
title_sort |
non-negative matrix factorization for the analysis of complex gene expression data: identification of clinically relevant tumor subtypes |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2008-01-01 |
description |
Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior. |
url |
https://doi.org/10.4137/CIN.S606 |
work_keys_str_mv |
AT attilafrigyesi nonnegativematrixfactorizationfortheanalysisofcomplexgeneexpressiondataidentificationofclinicallyrelevanttumorsubtypes AT mattiashoglund nonnegativematrixfactorizationfortheanalysisofcomplexgeneexpressiondataidentificationofclinicallyrelevanttumorsubtypes |
_version_ |
1724749868012929024 |