SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.

It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heteroge...

Full description

Bibliographic Details
Main Authors: Shiqian Ma, Daniel Johnson, Cody Ashby, Donghai Xiong, Carole L Cramer, Jason H Moore, Shuzhong Zhang, Xiuzhen Huang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4359112?pdf=render
id doaj-33a381d6a94f488b94738f831e0f6163
record_format Article
spelling doaj-33a381d6a94f488b94738f831e0f61632020-11-24T22:08:08ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01103e011713510.1371/journal.pone.0117135SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.Shiqian MaDaniel JohnsonCody AshbyDonghai XiongCarole L CramerJason H MooreShuzhong ZhangXiuzhen HuangIt is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.http://europepmc.org/articles/PMC4359112?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Shiqian Ma
Daniel Johnson
Cody Ashby
Donghai Xiong
Carole L Cramer
Jason H Moore
Shuzhong Zhang
Xiuzhen Huang
spellingShingle Shiqian Ma
Daniel Johnson
Cody Ashby
Donghai Xiong
Carole L Cramer
Jason H Moore
Shuzhong Zhang
Xiuzhen Huang
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
PLoS ONE
author_facet Shiqian Ma
Daniel Johnson
Cody Ashby
Donghai Xiong
Carole L Cramer
Jason H Moore
Shuzhong Zhang
Xiuzhen Huang
author_sort Shiqian Ma
title SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
title_short SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
title_full SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
title_fullStr SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
title_full_unstemmed SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.
title_sort sparcoc: a new framework for molecular pattern discovery and cancer gene identification.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2015-01-01
description It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.
url http://europepmc.org/articles/PMC4359112?pdf=render
work_keys_str_mv AT shiqianma sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT danieljohnson sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT codyashby sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT donghaixiong sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT carolelcramer sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT jasonhmoore sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT shuzhongzhang sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT xiuzhenhuang sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
_version_ 1725817585709088768