SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heteroge...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2015-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC4359112?pdf=render |
id |
doaj-33a381d6a94f488b94738f831e0f6163 |
---|---|
record_format |
Article |
spelling |
doaj-33a381d6a94f488b94738f831e0f61632020-11-24T22:08:08ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01103e011713510.1371/journal.pone.0117135SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.Shiqian MaDaniel JohnsonCody AshbyDonghai XiongCarole L CramerJason H MooreShuzhong ZhangXiuzhen HuangIt is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.http://europepmc.org/articles/PMC4359112?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shiqian Ma Daniel Johnson Cody Ashby Donghai Xiong Carole L Cramer Jason H Moore Shuzhong Zhang Xiuzhen Huang |
spellingShingle |
Shiqian Ma Daniel Johnson Cody Ashby Donghai Xiong Carole L Cramer Jason H Moore Shuzhong Zhang Xiuzhen Huang SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. PLoS ONE |
author_facet |
Shiqian Ma Daniel Johnson Cody Ashby Donghai Xiong Carole L Cramer Jason H Moore Shuzhong Zhang Xiuzhen Huang |
author_sort |
Shiqian Ma |
title |
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. |
title_short |
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. |
title_full |
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. |
title_fullStr |
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. |
title_full_unstemmed |
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification. |
title_sort |
sparcoc: a new framework for molecular pattern discovery and cancer gene identification. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2015-01-01 |
description |
It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification. |
url |
http://europepmc.org/articles/PMC4359112?pdf=render |
work_keys_str_mv |
AT shiqianma sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT danieljohnson sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT codyashby sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT donghaixiong sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT carolelcramer sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT jasonhmoore sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT shuzhongzhang sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification AT xiuzhenhuang sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification |
_version_ |
1725817585709088768 |