An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data

Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view,...

Full description

Bibliographic Details
Main Authors: Wen-Hui Wang, Ting-Yan Xie, Guang-Lei Xie, Zhong-Lu Ren, Jin-Ming Li
Format: Article
Language:English
Published: MDPI AG 2018-08-01
Series:Genes
Subjects:
Online Access:http://www.mdpi.com/2073-4425/9/8/397
id doaj-ad5f1092deb641cc9d4b783a917b898d
record_format Article
spelling doaj-ad5f1092deb641cc9d4b783a917b898d2020-11-24T20:58:44ZengMDPI AGGenes2073-44252018-08-019839710.3390/genes9080397genes9080397An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression DataWen-Hui Wang0Ting-Yan Xie1Guang-Lei Xie2Zhong-Lu Ren3Jin-Ming Li4State Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou 510515, ChinaState Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou 510515, ChinaState Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou 510515, ChinaCenter for Systems Medical Genetics, Department of Obstetrics & Gynecology Nanfang Hospital, Southern Medical University, Guangzhou 510515, ChinaState Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou 510515, ChinaIdentifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers.http://www.mdpi.com/2073-4425/9/8/397subtypes of cancercolon cancerBayesian robust principal componenthierarchical clusteringfeature selection
collection DOAJ
language English
format Article
sources DOAJ
author Wen-Hui Wang
Ting-Yan Xie
Guang-Lei Xie
Zhong-Lu Ren
Jin-Ming Li
spellingShingle Wen-Hui Wang
Ting-Yan Xie
Guang-Lei Xie
Zhong-Lu Ren
Jin-Ming Li
An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
Genes
subtypes of cancer
colon cancer
Bayesian robust principal component
hierarchical clustering
feature selection
author_facet Wen-Hui Wang
Ting-Yan Xie
Guang-Lei Xie
Zhong-Lu Ren
Jin-Ming Li
author_sort Wen-Hui Wang
title An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
title_short An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
title_full An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
title_fullStr An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
title_full_unstemmed An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
title_sort integrated approach for identifying molecular subtypes in human colon cancer using gene expression data
publisher MDPI AG
series Genes
issn 2073-4425
publishDate 2018-08-01
description Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers.
topic subtypes of cancer
colon cancer
Bayesian robust principal component
hierarchical clustering
feature selection
url http://www.mdpi.com/2073-4425/9/8/397
work_keys_str_mv AT wenhuiwang anintegratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT tingyanxie anintegratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT guangleixie anintegratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT zhongluren anintegratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT jinmingli anintegratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT wenhuiwang integratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT tingyanxie integratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT guangleixie integratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT zhongluren integratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
AT jinmingli integratedapproachforidentifyingmolecularsubtypesinhumancoloncancerusinggeneexpressiondata
_version_ 1716784809282371584