HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis

The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hiera...

Full description

Bibliographic Details
Main Authors: Yajie Meng, Min Jin
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-06-01
Series:Frontiers in Cell and Developmental Biology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/full
id doaj-f9b85df1ae0a4078ae9bfd135d00d3d3
record_format Article
spelling doaj-f9b85df1ae0a4078ae9bfd135d00d3d32021-06-30T06:42:42ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2021-06-01910.3389/fcell.2021.696359696359HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer DiagnosisYajie MengMin JinThe emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/fullprecision cancer diagnosishierarchical feature selectionensemble modeltranscriptome profilingDNA methylationbiomarker
collection DOAJ
language English
format Article
sources DOAJ
author Yajie Meng
Min Jin
spellingShingle Yajie Meng
Min Jin
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
Frontiers in Cell and Developmental Biology
precision cancer diagnosis
hierarchical feature selection
ensemble model
transcriptome profiling
DNA methylation
biomarker
author_facet Yajie Meng
Min Jin
author_sort Yajie Meng
title HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
title_short HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
title_full HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
title_fullStr HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
title_full_unstemmed HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis
title_sort hfs-slpee: a novel hierarchical feature selection and second learning probability error ensemble model for precision cancer diagnosis
publisher Frontiers Media S.A.
series Frontiers in Cell and Developmental Biology
issn 2296-634X
publishDate 2021-06-01
description The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).
topic precision cancer diagnosis
hierarchical feature selection
ensemble model
transcriptome profiling
DNA methylation
biomarker
url https://www.frontiersin.org/articles/10.3389/fcell.2021.696359/full
work_keys_str_mv AT yajiemeng hfsslpeeanovelhierarchicalfeatureselectionandsecondlearningprobabilityerrorensemblemodelforprecisioncancerdiagnosis
AT minjin hfsslpeeanovelhierarchicalfeatureselectionandsecondlearningprobabilityerrorensemblemodelforprecisioncancerdiagnosis
_version_ 1721353305859293184