RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches

Background. Breast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes a...

Full description

Bibliographic Details
Main Authors: Zhezhou Yu, Zhuo Wang, Xiangchun Yu, Zhe Zhang
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Computational Intelligence and Neuroscience
Online Access:http://dx.doi.org/10.1155/2020/4737969
id doaj-bb553df3de0040eea2a002f29e630c2a
record_format Article
spelling doaj-bb553df3de0040eea2a002f29e630c2a2020-11-25T04:06:19ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52732020-01-01202010.1155/2020/47379694737969RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning ApproachesZhezhou Yu0Zhuo Wang1Xiangchun Yu2Zhe Zhang3College of Computer Science and TechnologyCollege of Computer Science and TechnologyCollege of Computer Science and TechnologyCollege of Computer Science and TechnologyBackground. Breast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes are still incomplete. To identify and explore the corresponding interaction mechanisms of genes for each subtype of breast cancer can impose an important impact on the personalized treatment for different patients. Methods. We integrate the biological importance of genes from the gene regulatory networks to the differential expression analysis and then obtain the weighted differentially expressed genes (weighted DEGs). A gene with a high weight means it regulates more target genes and thus holds more biological importance. Besides, we constructed gene coexpression networks for control and experiment groups, and the significantly differentially interacting structures encouraged us to design the corresponding Gene Ontology (GO) enrichment based on gene coexpression networks (GOEGCN). The GOEGCN considers the two-side distinction analysis between gene coexpression networks for control and experiment groups. The method allows us to study how the modulated coexpressed gene couples impact biological functions at a GO level. Results. We modeled the binary classification with weighted DEGs for each subtype. The binary classifier could make a good prediction for an unseen sample, and the experimental results validated the effectiveness of our proposed approaches. The novel enriched GO terms based on GOEGCN for control and experiment groups of each subtype explain the specific biological function changes according to the two-side distinction of coexpression network structures to some extent. Conclusion. The weighted DEGs contain biological importance derived from the gene regulatory network. Based on the weighted DEGs, five binary classifiers were learned and showed good performance concerning the “Sensitivity,” “Specificity,” “Accuracy,” “F1,” and “AUC” metrics. The GOEGCN with weighted DEGs for control and experiment groups presented a novel GO enrichment analysis results and the novel enriched GO terms would further unveil the changes of specific biological functions among all the BRCA subtypes to some extent. The R code in this research is available at https://github.com/yxchspring/GOEGCN_BRCA_Subtypes.http://dx.doi.org/10.1155/2020/4737969
collection DOAJ
language English
format Article
sources DOAJ
author Zhezhou Yu
Zhuo Wang
Xiangchun Yu
Zhe Zhang
spellingShingle Zhezhou Yu
Zhuo Wang
Xiangchun Yu
Zhe Zhang
RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
Computational Intelligence and Neuroscience
author_facet Zhezhou Yu
Zhuo Wang
Xiangchun Yu
Zhe Zhang
author_sort Zhezhou Yu
title RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
title_short RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
title_full RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
title_fullStr RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
title_full_unstemmed RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
title_sort rna-seq-based breast cancer subtypes classification using machine learning approaches
publisher Hindawi Limited
series Computational Intelligence and Neuroscience
issn 1687-5273
publishDate 2020-01-01
description Background. Breast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes are still incomplete. To identify and explore the corresponding interaction mechanisms of genes for each subtype of breast cancer can impose an important impact on the personalized treatment for different patients. Methods. We integrate the biological importance of genes from the gene regulatory networks to the differential expression analysis and then obtain the weighted differentially expressed genes (weighted DEGs). A gene with a high weight means it regulates more target genes and thus holds more biological importance. Besides, we constructed gene coexpression networks for control and experiment groups, and the significantly differentially interacting structures encouraged us to design the corresponding Gene Ontology (GO) enrichment based on gene coexpression networks (GOEGCN). The GOEGCN considers the two-side distinction analysis between gene coexpression networks for control and experiment groups. The method allows us to study how the modulated coexpressed gene couples impact biological functions at a GO level. Results. We modeled the binary classification with weighted DEGs for each subtype. The binary classifier could make a good prediction for an unseen sample, and the experimental results validated the effectiveness of our proposed approaches. The novel enriched GO terms based on GOEGCN for control and experiment groups of each subtype explain the specific biological function changes according to the two-side distinction of coexpression network structures to some extent. Conclusion. The weighted DEGs contain biological importance derived from the gene regulatory network. Based on the weighted DEGs, five binary classifiers were learned and showed good performance concerning the “Sensitivity,” “Specificity,” “Accuracy,” “F1,” and “AUC” metrics. The GOEGCN with weighted DEGs for control and experiment groups presented a novel GO enrichment analysis results and the novel enriched GO terms would further unveil the changes of specific biological functions among all the BRCA subtypes to some extent. The R code in this research is available at https://github.com/yxchspring/GOEGCN_BRCA_Subtypes.
url http://dx.doi.org/10.1155/2020/4737969
work_keys_str_mv AT zhezhouyu rnaseqbasedbreastcancersubtypesclassificationusingmachinelearningapproaches
AT zhuowang rnaseqbasedbreastcancersubtypesclassificationusingmachinelearningapproaches
AT xiangchunyu rnaseqbasedbreastcancersubtypesclassificationusingmachinelearningapproaches
AT zhezhang rnaseqbasedbreastcancersubtypesclassificationusingmachinelearningapproaches
_version_ 1715049259271716864