HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hiera...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-06-01
|
Series: | Frontiers in Cell and Developmental Biology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/full |
id |
doaj-f9b85df1ae0a4078ae9bfd135d00d3d3 |
---|---|
record_format |
Article |
spelling |
doaj-f9b85df1ae0a4078ae9bfd135d00d3d32021-06-30T06:42:42ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2021-06-01910.3389/fcell.2021.696359696359HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer DiagnosisYajie MengMin JinThe emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/fullprecision cancer diagnosishierarchical feature selectionensemble modeltranscriptome profilingDNA methylationbiomarker |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yajie Meng Min Jin |
spellingShingle |
Yajie Meng Min Jin HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis Frontiers in Cell and Developmental Biology precision cancer diagnosis hierarchical feature selection ensemble model transcriptome profiling DNA methylation biomarker |
author_facet |
Yajie Meng Min Jin |
author_sort |
Yajie Meng |
title |
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis |
title_short |
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis |
title_full |
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis |
title_fullStr |
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis |
title_full_unstemmed |
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis |
title_sort |
hfs-slpee: a novel hierarchical feature selection and second learning probability error ensemble model for precision cancer diagnosis |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Cell and Developmental Biology |
issn |
2296-634X |
publishDate |
2021-06-01 |
description |
The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1). |
topic |
precision cancer diagnosis hierarchical feature selection ensemble model transcriptome profiling DNA methylation biomarker |
url |
https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/full |
work_keys_str_mv |
AT yajiemeng hfsslpeeanovelhierarchicalfeatureselectionandsecondlearningprobabilityerrorensemblemodelforprecisioncancerdiagnosis AT minjin hfsslpeeanovelhierarchicalfeatureselectionandsecondlearningprobabilityerrorensemblemodelforprecisioncancerdiagnosis |
_version_ |
1721353305859293184 |